A Spanish data scientist’s strategy to win 99% of the time at Wordle


SOURCE: ENGLISH.ELPAIS.COM
JAN 14, 2022

There are two types of social media users: those who are addicted to Wordle and those who are puzzled by the little green and yellow squares their friends keep sharing on Twitter. The simple word game consists of guessing a hidden five-letter word in six tries with correct letters in the correct position showing up green; the correct letter in the wrong position showing up yellow and the wrong letter showing up grey. It may sound basic but the formula has become a viral phenomenon in recent months with the 90 fans who played it in November soaring to a current 300,000.

The inventor of the original English version, Welsh-born software engineer Josh Wardle, created it during the pandemic to entertain his partner – a word game addict, as he told The New York Times. He then got his family to play on a group chat and it later became popular thanks to Twitter and Facebook.

MORE INFORMATION

Spanish developer of biodegradable battery: ‘Electronic devices will be the next wave of waste’

The game’s simplicity is, in fact, the key to its success. The player accesses a no-frills page, with no fee, registration or advertising, and tries to guess a five-letter word in a similar format to the legendary code-breaking game, Mastermind. Wardle only poses one challenge a day, an idea he borrowed from The New York TimesSpelling Bee, so his followers have to wait 24 hours to play again. But the game is perhaps more engaging because of this. Once it has been played several times, a question arises: is there an optimal method to solve the puzzle and reduce the number of tries?

Esteban Moro, a professor, researcher and data scientist at Madrid’s Carlos III University and visiting professor at the Massachusetts Institute of Technology (MIT), has sought a scientific answer to this question. In an article published on his blog, he described a strategy that would solve 99% of the 206 challenges posed so far by Wordle in less than six steps, although this method cannot be applied to other versions of the game, such as those that have been circulating in Spanish and even Galician.

His strategy is based on two factors: start the game with a word identified as the best option, and make successive attempts following a simple rule. But how do you find this rule?

Moro has used a free software programming language called R for his calculations, which allows him to perform statistical analysis and try to reproduce Wordle on his computer. He then created a game with the same rules that includes all the 12,972 five-letter words that exist in the English language. The program then simulated successive games, always starting with the word “aeros,” which has the five most common letters used in English. In the next five attempts, a word at random is chosen from among all those that could fit the solution. With these instructions, the program succeeded in finding the solution in less than six steps 80% of the times it had to guess a randomly chosen word, with an average of 5.1 attempts. And it solved almost 90% of the puzzles, with an average of 4.7 tries, when given one of the more than 200 puzzles already proposed by Wardle.

But there was a way to improve these statistics. Other researchers discovered that the game’s solutions are not chosen randomly from among the more than 12,000 possibilities: some words were more likely to come up than others. Cross-referencing the correct answers from previous Wordles with a body of the most commonly used English terms, Moro confirmed that Wardle chooses frequently used words in English, something the game’s inventor also pointed out in his interview with The New York Times, which mentioned that he avoided rare words. “It makes perfect sense,” says Moro from his home in Boston. “For the game to be a success, it needs to be simple and playable, and picking the most common terms means that in the end, we all get it right in just a few tries.”

I’m a data scientist and, as such, I’m always looking for these slants and patterns that help us make algorithms

Esteban Moro, data scientist at Madrid‘s Carlos III University

Moro then changed the algorithm. He programmed the simulations so that, also starting the game with the word “aeros,” he would always then choose the most used term in English among all the possibilities, with the help of a tool that orders the words according to frequency of use. The results hardly improved for randomly chosen words, but the strategy proved much more effective for words that Wardle had already proposed in his challenges: the program solved 97% of the puzzles in an average of 3.9 attempts.

“I’m a data scientist and, as such, I’m always looking for these slants and patterns that help us make algorithms. So, what I’ve done is to see there was a bias in the words Wardle chose, and exploit this to improve the strategy,” Moro explains.

Was there anything else that could be done to improve the method? Perhaps change the starting word? The letters of “aeros” include the five most frequent letters used in English (as Edgar Allan Poe pointed out in the cryptographic challenge included in his famous short story The Golden Beetle), but Moro noticed that in the more than 200 solutions published so far in the original version of the game, the “t” appeared more often than the “s.” He then changed the initial word “aeros” to “orate” and – maintaining the rule of always choosing the most frequently used word thereafter – the algorithm solved 99% of the puzzles posed by Wardle. Moro points out, however, that this two-point improvement in the results could be a statistical fluke, and that more data would be needed to assess whether it is significant.

Wardle deliberately chooses more common words to make the game more user-friendly. But a super-difficult Wordle could be programmed, using rarer terms or containing, for example, several letters common to many words. “In English, it would be quite difficult to guess the word ‘belly,’ for example, because there are many words that end in those three letters,” Moro explains. In the case of deliberately including rare words, Moro’s method would not work, and new biases would have to be detected and the algorithm adjusted so that it would choose, for example, the least used terms or those most similar to others.

Aside from nailing the best possible strategy, there is another recurring question thrown up by Wordle, which is why it has become so popular all over the world. Moro is not alone in believing that part of it has to do with the fact that it brings a certain serenity to our high-octane lifestyles. “Because Wardle publishes only one puzzle a day, that slowness brings us synchronized, unhurried social interaction. And that’s one of the successes of the game,” he explains.

Similar articles you can read