Wordle, it’s all the craze lately.
I’ve done a deep dive figuring out the best first guess word for Wordle.
TLDR: Skip to the last table which breaks down the best words by the greatest chance to have the most letters in the word in the correct spot vs the most letters in the word but in the wrong spot.
Find out more after the jump.
It seems obvious that in order to get the greatest number of matches your first guess should contain as many vowels as possible, but I’ve looked at the Wordle word library from the source code of the official site and I’ve discovered there are a few consonants that are better choices than i, u, and y. It should also be noted the Wordle source code contains 2,315 days of answers (all common 5-letter English words) and 10,657 other valid, less-common 5-letter English words. So a collection words that can be selected as “winners” — I’ll call this the winning set and words that are guessable but will never win, the non-winning set.
Below is the number of words within the winning set that contain each letter at least once:
|e: 1056||s: 618||p: 346||v: 149|
|a: 909||n: 550||g: 300||x: 37|
|r: 837||u: 457||m: 298||z: 35|
|o: 673||c: 448||b: 267||q: 29|
|t: 667||y: 417||f: 207||j: 27|
|l: 648||h: 379||k: 202|
|i: 647||d: 370||w: 194|
Unfortunately, there are no winning words that uses all the top letters together — e,a,r,o and t (or l or i) — so, one might think, a great opening word to be “arose” with four of the five top letters and 4 093 possible partial or full matches.
However, overall letter frequency isn’t the only important metric for picking a great first word. Each letter in your first guessed word may appear more overall but one should consider how often they appear in the correct location within the word. The fact that some letters can appear more than once also throws a wrench in the gears. After investigating how often each letter appears in each position, I discovered the following data (this time sorted by total possible matches within each position instead of by matches with words regardless of multiple matches):
|e: 1233||s: 669||p: 367||v: 153|
|a: 979||n: 575||m: 316||z: 40|
|r: 899||c: 477||g: 311||x: 37|
|o: 754||u: 467||b: 281||q: 29|
|t: 729||y: 425||f: 230||j: 27|
|l: 719||d: 393||k: 211|
|i: 671||h: 389||w: 195|
And here are the same numbers but broken down into individual scores for each position. I’ve set the cell background to grey to identify the highest score for that position on each letter.
These graphs illustrate the same (each sorted by highest frequency):
And here is the combined graph:
It should be noted, the letters at the end of the list [v,x,z,q,j] should be avoided until you are confident you know the word (rather than just guessing and hoping that if you’re wrong they’ll still show up as a partial match).
Other things I observed: when I noticed the lack of words ending in “s” I realized there are no plural words. If the word ends in “s” it’s because the root of the word ends in “s” and not because of an “s” suffix.
Only one word ends in “u”. Out of curiosity I looked it up and it’s “bayou”. I’ve been told that “ADIEU” is a popular starting word, and while it is in the usable dictionary, it’s not in the winning set.
After hypothesizing that SOAPY is the best word I wanted to compare a number of other suggested “best words for Wordle” and compare them by statistical likelihood based on position. I came up with the following results:
SOAPY vs AROSE
s:366 vs a:141 [+225]
o:279 vs r:267 [+12]
a:307 vs o:279 [+28]
p:50 vs s:171 [-121]
y:364 vs e:424 [-60]
AROSE gets up and leaves SOAPY with +80 more possible matches.
Just as I started to feel confident that SOAPY really would be the best first word — mathematically speaking, I realized that there are not one but two important criteria to consider. The first is by having the most letters in the word in the correct spot and the second is having the most letters in the word but in the wrong spot. I started to think maybe the second metric is more important. I’ll leave that choice up to you.
Here are the numbers:
1st + 2nd)
Conclusion: SOAPY wins for first criteria having the most letters most likely in the right position and ROATE wins for the second criteria having the most likely letters even if they are in the wrong position. ROATE also has the highest combined score but STARE wins for having almost as good a combined score as ROATE and gets bonus points because it’s in the set of possible winning words.
† These words exist in the guessable set but not the winning set.