WARNING: This blog contains racist language and Wordle strategies that are of no use to any living person.
It started with a Twitter discussion about the best starting word for Wordle, of course.
It was Twitter, so everybody had strong opinions on the subject, of course.
And, of course, of course, of course, I did something in Excel.
I didn’t mean to. We were talking about the best opening words, so I innocently downloaded a list of 5-letter words and pasted it into an Excel column, then wrote a formula to work out, for each letter of the alphabet, what fraction of words it appears in. Once you have that it’s easy to assign each word a score, by adding up the frequency of all its letters (excluding duplicates).
My argument was that the best starting word was the one that scored the highest (i.e. it contained the most frequently used letters). Despite this logic being based on actual, literal, numbers it was not universally accepted.
The thread provided seven loose strategies for the starting word:
- Start with AROSE, my highest scoring word.
- Actually, you should ignore the letters in AROSE, because you can easily fill them in afterwards, actually.
- Rise to the challenge with YEAST, the word I typically use (or rather used) as my starting one
- Take into account that S is the most common starting letter for words – STARE was the highest scoring word I had starting with S
- Put the S at the end, where it belongs. TEARS (unsurprisingly) scores the same as STARE and was the highest scoring word ending in S in my list
- Just start with the first word that comes into your head
- Make guesses that you know are wrong, to get extra letters.
I discounted strategy 7 straight away, because you could also get extra letters by guessing answers that might be right. That is one of many things I regret from my youth, 4 days ago.
Anyway, I quickly spotted that with my word scoring list I could add a mask for letters I knew were in the right place. If, for example, you know that the R in AROSE is in the correct place then the mask ?R??? will filter out all of the words that don’t have R as their second letter. Similarly, if you know that the A and the O are incorrect then you can remove all the words containing either of those letters, and if the S and the E are correct, but in the wrong place then you can remove both the words that don’t have either of those letters and the ones which fit the masks ???S? and ????E.
Then, by having my letters table only count the letters in words that hadn’t been excluded, I could generate new scores for each letter, and then new scores for all the non-excluded words and Excel would be offering me my next guess. I’d accidentally made an Excel sheet that could play Wordle!
Having such a sheet it would be criminal not to use it to prove my AROSE hypothesis, by having it pick answers at random and play Wordle against itself.
It should be said, straight away, that it’s not quick. After every guess it has to evaluate the formula alongside each of my list of 5,756 words and, for example, the one to work out whether each word is a valid guess or not looks like this:
=(COUNTIF(A1,$K$12))*(NOT((OR(ISNUMBER(MATCH(MID(A1,ROW($A$1:$A$5),1),M:M,0))))))*(AND(ISNUMBER(FIND(IFERROR(INDIRECT(“N1:N”&COUNTA(N:N)),A1),A1))))*(IFERROR(SUMPRODUCT(–(COUNTIF(A1,INDIRECT(“O1:O” & COUNTA(O:O))))),0)=0)
This means that the macro I wrote, to make and evaluate guesses and play game after game, has to slow itself down, so that Excel’s formula recalculation can keep up. Even so, it’s much faster than a human player and can play 1,000 games in about an hour…and 1,000 games seemed like a sensible number to generate some statistics.
The first try was with AROSE as the starting point and, an hour later, it returned a win rate of 91%, losing only 90 games in the 1,000 it ran. Of the 910 games it did win, it did so in an average of 4.16 guesses. Both of these results seemed pretty good to me, and my gut feeling was it was better than human player could do.
The next 1,000 games used my starting word, YEAST, and the results were promising. The win rate was slightly lower, at 90.4%, and the average number of guesses was slightly higher, 4.2.
My work was done. AROSE was the best starting word, YEAST was good, but not quite as good, and my little Excel sheet was playing Wordle perfectly.
Really, just as a formality, I ran it against some of the other strategies.
TEARS returned a win rate of 92.9%, the highest yet, and the lowest average number of tries, just 4.06.
This wasn’t too much of an upset. Although 92.9% is undoubtedly higher than my golden boy, AROSE, it’s not significantly so, statistically speaking. In other words, the chance was greater than 5% that the difference between the two win rates was down to random chance, rather than representing a genuine performance difference. TEARS had got lucky.
STARE, however, proved harder to explain. It produced a win rate of 95.4%, which is significantly better than AROSE. My WORLD was crumbling.
Strategy 2 – avoid the common letters – brought it tumbling down. Completely off the top of my head I picked CAMPS as a word that mixed common letters with mid-tier ones, and it produced a score of 95.4%!
If a word that I’d picked at random could do so well then what was the next test, starting with a random word, going to do?
My theory was that picking at random would be a terrible strategy. No-one would, for example, pick LOLLY as a starting word, but my random selector could. Stupid random selector. There was not, therefore, unrestrained joy when the random words strategy returned a win-rate of 92.4%.
Again, this wasn’t statistically significantly higher than the word I’d been asserting, 24 hours earlier, was unbeatable, but it was annoying that all these insignificant differences were insignificantly higher.
In a fit of pique, I selected the lowest scoring word, GYPPY, as a starting point.
Here, at last, finally I found a word that performed worse than AROSE. Yes, indeed, out of its 1,000 games it got 3 fewer right than AROSE had. Three! FUCKING THREE!!!
If you’ve been keeping count you’ll know that, by now, I’ve played 7,000 games, which means that there’s about a 70% chance that one of the games would have guessed the answer first try. Well, it did, and it was fucking GYPPY.
GYPPY can fuck right off!
[Apologies, by the way, for using that word. I didn’t vet the list of words, which wouldn’t have helped anyway, because I didn’t know what it meant, and it was selected for use by an algorithm. I’ll not mention it again]
Those first 7 games also gave me a list of “hard” words –387 answers that one or more of the runs had failed on. Just for comparison I tried all the start words I’d used so far against that list of words.
AROSE, you will not by now be surprised to hear, stunk; solving just 32.6% of them. Even YEAST did better, with 35.9%. Both were well behind TEARS’ 42.1%, STARE and starting with a random word both managed to crack 48.1% of them, and CAMPS got 60.7%!
Why was CAMPS so good? In an effort to explain I picked the exact mid-point of the league table of words, PEAKY, and tried that. In results that are blindingly obvious in hindsight, it scored almost exactly the same as starting with a random word.
More than ten thousand games of Wordle had so far taught me:
- My “best possible” starting word was worse than pretty much everything else I’d tried
- CAMPS was a supernaturally good starting word, but I had no idea why
- Ergo, I had no idea what a good strategy was any more
A wise friend suggested I should move to a two-word strategy. The one she uses starts with ATONE and then, unless she gets 2 green letters or 3 yellows (or, presumably, a green and 2 yellows) she moves on to SHIRK.
This was the discounted strategy 7, selecting a word that you know is wrong, to get more letters (which Wordle doesn’t allow, if you play in hard mode, but nobody does, so screw that).
The wise friend also mentioned she was thinking of moving to using STONE then HAIRY and I, being in the slow-learners class at school, suggested that DOLES and TRAIN would be better, because they use all 10 of the most common letters in 5-letter words.
Just for completeness I paired up the unbeatable CAMPS with another mid-tier word that I’d pulled out of my arse, DOUGH (the word, not the arse), and ran that as well.
The STONE/HAIRY combo managed to equal CAMPS’ 95.4% record, and with a slightly lower average number of guesses (4.06 vs 4.12). ATONE/SHIRK managed a practically-the-same result of 95.2%, CAMPS/DOUGH was nearly a whole percentage point worse than CAMPS by itself and, of course, my DOLES/TRAIN was bottom of the pack, with only 93.2% of games won.
I now, finally, realised the problem was the BEARS-trap, which is this; if you get to a point where you know the word is in the form ?EARS then there are a bunch of letters that could go first, B, D, F, G, etc. My algorithm was very good at getting to that point but would then go through the possible answers alphabetically. This was not a winning strategy, but no worse than simply guessing a first letter.
What was needed was to eliminate as many letters as possible before you get to your final guess, so that if you arrive there knowing the word looks like ?EARS then you’ve as few remaining answers as possible to test.
This will never be perfect, not least of all because you can’t eliminate 26 letters in 5 x 5-letter guesses but also because SEARS and REARS are also valid answers, so even if you had eliminated every other letter, it would still be a 50/50 chance.
My first line of attack was to extend the two-word strategy to a four-word strategy, QUICK/BROWN/FATLY/HEMP, with those being the first 4 guesses unless all the letters were discovered earlier.
Here, at last, I managed to topple the unexpected CAMPS from its top spot, with a win rate of 97.4%, higher at a p <0.01 rate of significance, i.e. there’s less than a 1% chance that the difference in scores was just luck. The trade off for this was the average number of guesses taken to win leaping up to 5.16.
At this point I did quite a lot of experimenting with a way to make the 4-word strategy more dynamic. My initial impulse was to value wrong guesses near the start of the game and then home in on an answer in the final couple of guesses.
Of course, if you find yourself in the BEARS-trap after guess 4 then what you need for guess 5 is a word that tests as many of the remaining possibilities as it can. Your 5th guess shouldn’t be anything like ?EARS, it should be testing possible first letters.
I realised that ‘possible’ was the key word in that sentence. At the start of the game you can guess words that will give you extra letters, willy-nilly, but as the answer takes shape you need to restrict yourself to words that will give you additional letters that could possibly be in the answer. If you have ?EARS then there’s no point guessing QUICK, even if you haven’t yet checked any of those 5 letters, because none of them can be right.
I adapted my formula, which were now also counting how many untried letters were in each word, to only count untried letters that appeared in words that fitted into possible answers and tested this solution with both AROSE and CAMPS as the starting word.
I’ve still no idea why CAMPS is so much better. Maybe it’s not the best starting word, but as it would take 240 days of continuous playing to try all the start words, I’m not about to check the lot.
Just for completeness I also ran AROSE and CAMPS against my list of ‘hard’ words. AROSE won in 86.3% of those games, CAMPS in 93.3%
Now, after more than 20,000 games of Wordle, I’m confident that I have a system that (a) works as near as dammit all the time and (b) is absolutely no use to anybody not using a computer to play for them, unless they can mentally tally up valid, untried letters and words containing them.
Oh, yes, and I was dead wrong about everything I said in that Twitter thread. Further tests are necessary to determine if this is generally true of all Twitter threads I post in.