Categories
games

Best First Guess Word For Wordle

Wordle, it’s all the craze lately.

I’ve done a deep dive figuring out the best first guess word for Wordle.

TLDR: Skip to the last table which breaks down the best words by the greatest chance to have the most letters in the word in the correct spot vs the most letters in the word but in the wrong spot.

Find out more after the jump.

It seems obvious that in order to get the greatest number of matches your first guess should contain as many vowels as possible, but I’ve looked at the Wordle word library from the source code of the official site and I’ve discovered there are a few consonants that are better choices than i, u, and y. It should also be noted the Wordle source code contains 2,315 days of answers (all common 5-letter English words) and 10,657 other valid, less-common 5-letter English words. So the collection of words that can be guessed as “winners”, I’ll call this the winning set and words that are guessable but will never win, the non-winning set.

Below is the number of words within the winning set that contain each letter at least once:

e: 1056 s: 618 p: 346 v: 149
a: 909 n: 550 g: 300 x: 37
r: 837 u: 457 m: 298 z: 35
o: 673 c: 448 b: 267 q: 29
t: 667 y: 417 f: 207 j: 27
l: 648 h: 379 k: 202
i: 647 d: 370 w: 194

Unfortunately, there are no winning words that uses all the top letters together — e,a,r,o and t (or l or i) — so, one might think, a great opening word to be “arose” with four of the five top letters and 4 093 possible partial or full matches.

However, overall letter frequency isn’t the only important metric for picking a great first word. Each letter in your first guessed word may appear more overall but one should consider how often they appear in the correct location within the word. The fact that some letters can appear more than once also throws a wrench in the gears. After investigating how often each letter appears in each position, I discovered the following data (this time sorted by total possible matches within each position instead of by matches with words regardless of multiple matches):

e: 1233 s: 669 p: 367 v: 153
a: 979 n: 575 m: 316 z: 40
r: 899 c: 477 g: 311 x: 37
o: 754 u: 467 b: 281 q: 29
t: 729 y: 425 f: 230 j: 27
l: 719 d: 393 k: 211
i: 671 h: 389 w: 195

And here are the same numbers but broken down into individual scores for each position. I’ve set the cell background to grey to identify the highest score for that position on each letter.

□□□□ □□□ □□□□ □□□ □□□□
e 72 242 177 318 424
a 141 304 307 163 64
r 105 267 163 152 212
o 41 279 244 132 58
t 149 77 111 139 253
l 88 201 112 162 156
i 34 202 266 158 11
s 366 16 80 171 36
n 37 87 139 182 130
c 198 40 56 152 31
u 33 186 165 82 1
y 6 23 29 3 364
d 111 20 75 69 118
h 69 144 9 28 139
p 142 61 58 50 56
m 107 38 61 68 42
g 115 12 67 76 41
b 173 16 57 24 11
f 136 8 25 35 26
k 20 10 12 56 113
w 83 44 26 25 17
v 43 15 49 46 0
z 3 2 11 20 4
x 0 14 12 3 8
q 23 5 1 0 0
j 20 2 3 2 0

These graphs illustrate the same (each sorted by highest frequency):

And here is the combined graph:

It should be noted, the letters at the end of the list [v,x,z,q,j] should be avoided until you are confident you know the word (rather than just guessing and hoping that if you’re wrong they’ll still show up as a partial match).

Other things I observed: when I noticed the lack of words ending in “s” I realized there are no plural words. If the word ends in “s” it’s because the root of the word ends in “s” and not because of an “s” suffix.

Only one word ends in “u”. Out of curiosity I looked it up and it’s “bayou”. I’ve been told that “ADIEU” is a popular starting word, and while it is in the usable dictionary, it’s not in the winning set.

After hypothesizing that SOAPY is the best word I wanted to compare a number of other suggested “best words for Wordle” and compare them by statistical likelihood based on position. I came up with the following results:

SOAPY vs AROSE

s:366 vs a:141 [+225]
o:279 vs r:267 [+12]
a:307 vs o:279 [+28]
p:50 vs s:171 [-121]
y:364 vs e:424 [-60]

AROSE gets up and leaves SOAPY with +80 more possible matches.

Just as I started to feel confident that SOAPY really would be the best first word — mathematically speaking, I realized that there are not one but two important criteria to consider. The first is by having the most letters in the word in the correct spot and the second is having the most letters in the word but in the wrong spot. I started to think maybe the second metric is more important. I’ll leave that choice up to you.

Here are the numbers:

Word 1st Criteria
(position)
2nd Criteria
(overall likeliness)
Combined
1st + 2nd)
ROATE1 1254 4594 5848
AROSE 1282 4534 5816
TEARS 886 4509 5395
STARE 1326 4509 5835
ARISE 1269 4451 5720
LATER 1033 4117 5150
RATIO 736 4032 4768
ROAST 1115 4030 5145
ADIEU1 746 3743 4489
AUDIO 628 3264 3892
SOAPY 1366 3194 4560

Conclusion: SOAPY wins for first criteria having the most letters most likely in the right position and ROATE wins for the second criteria having the most likely letters even if they are in the wrong position. ROATE also has the highest combined score but STARE wins for having almost as good a combined score as ROATE and gets bonus points because it’s in the set of possible winning words.

Update: I just realized that the set of possible winning words is actually shrinking as each day passes. Words that have already won could be removed from that set and eventually the best guesses could change based on what’s left in the winning set. The calculations I’ve made here are based on all the words that have ever or ever will be winners from the original set.

  1. These words exist in the guessable set but not the winning set.[][]