Wordle Wordlebot Analytics Data Puzzle Game Efficiency

I
dear discussion games. For nearly a decade I had a non-and so-healthy obsession with Words With Friends, playing up to a dozen games simultaneously each day with players who were, in truth, not my friends but random opponents. However, in recent years the once-popular mobile app has fallen out of favor, leaving me and a lingering crew of oddballs who would initiate bizarre conversations with me via the in-app conversation function (meet Effigy 1 for one of many examples). Consequently, my involvement in the app waned and eventually died out.

Figure 1: Snippet of a Words With Friends in-app conversation with a stranger (one of many). My attempt at humor always went unappreciated (Image by author).

So I heard virtually this game called Wordle that was sweeping the nation. No downloads needed, no account logins either, no ads, and no unsolicited letters from randos. I was intrigued. For those unfamiliar with Wordle, the object of the game is to approximate the 5-letter English language word of the twenty-four hours (the “Wordle”) in half-dozen tries. The game tells you if you guessed a letter correctly and as well if you guessed its position correctly within the discussion.

Afterwards playing for a week I started wondering: what is the optimal first approximate? With your very first gauge you’ve got nothing to get on, right? Well, that’south not completely true. There is some
a priori
knowledge we have going in. For example, nosotros know the right discussion must be an English language discussion that is 5 letters long. Information technology probably needs i or more vowels, or at to the lowest degree a
Y. Likewise, you lot might have heard that
Due east
is the most used alphabetic character in the English language¹ (though this is not necessarily true of all
5-letter
English words). The scientist in me itched to discover a way to decide on the best Wordle guess using publicly available word information.

Popular:   Elon Musk Twitter Stock Oneplus 10 Review Alienware Aw3423dw Vergecast Podcast 492

Opposite to what you might read online, it’s not always necessary to arroyo a data problem with a land-of-the-art automobile learning model. You may non need to spend days worrying about tuning hyperparameters, cantankerous-validation, or exactly how many hidden layers to include in your neural network. Sometimes a cup of coffee, a few dozen lines of code, and a couple bar charts can go you lot pretty far in a short corporeality of time.

The first step toward gleaning some quick insights was to find a lexicon of all words in the English language. Luckily, this proved to be adequately easy to do (because the Internet). I constitute a text file on GitHub in a repo past dwyl that contains 370,000 words. For the purposes of Wordle nosotros only care about v-alphabetic character words, which leaves us with virtually 16,000 words. The next thing I wanted to know is what the most common letters are amongst five-letter words. For this I needed to compute the frequency of each letter, the results for which are shown in Figure 2.

Figure 2: Bar chart showing frequency distribution of letters within 5-letter words.

Figure 2: Frequency distribution of letters found in 5-letter of the alphabet words. 10.5% of letters are
A,
nine.8% of letters are
E, so on (Image past writer).

I was a bit surprised to see that among 5-letter words
A
is the most common, comprising 10.5% of all letters. The letter
E
is a close second at ix.8%, followed past
S
at 8.2%. Unsurprisingly, all vowels sit in the top x while letters like
V,
Z,
J,
X, and
Q
are the least frequent. Another interesting way to view the data is to have the cumulative sum to obtain a cumulative frequency version of Figure 2. Figure 3 below shows what that distribution looks like.

Figure 3: Bar chart showing cumulative frequency distribution of letters within 5-letter words.

Figure iii: Cumulative frequency distribution of letters constitute in 5-letter words. We can see that the tiptop 7 letters account for over half of all letters (Prototype by author).

The groovy matter virtually the cumulative distribution is that we can clearly see that the top seven letters (A,
E,
Due south,
O,
R,
I,
L) account for over one-half (53%) of all letters amid 5-letter words! Information technology seems logical that the best first Wordle guesses would be words that contain only these meridian 7 messages. It turns out that there are 231 words in this lexicon that
just
employ the letters
A,
East,
Southward,
O,
R,
I, and
Fifty. And then we’ve now shrunk our gear up of 16,000 5-letter words to 231 — a 98.5% reduction!

Popular:   Truth Social Billy Boozer Josh Adams Donald Trump Social Media Rumble Tmtg

But some of these 231 words include
SALSA, which (while delicious) but consists of 3 distinct letters and thus diminishes our power to eliminate messages in our kickoff attempt. Therefore,
a better word to guess first would be one that uses only the top seven letters and only uses each letter once. Imposing this requirement leaves u.s.a. with only threescore words, which is a more than reasonable amount that we can sort through quickly.

Ane important caveat with this entire analysis is that I am using as input a full list of 5-letter English words. In fact, the listing of possible Wordle words is a pared-downward collection of almost 2,500 words that are the virtually common.² Having access to this list of 2,500 words would greatly improve the results here. In lieu of this, I can instead manually sift through our list of sixty words and remove the more than esoteric ones. The remaining words form my curated listing of
21 All-time Words to Use every bit Wordle Guesses
(in alphabetical order):


Aisle
ALOES
Ascend
AROSE
EARLS
LAIRS
Light amplification by stimulated emission of radiation
LIARS
LIERS
LORIS
LOSER
OILER
ORALS
Rails
RAISE
REALS
RILES
ROILS
ROLES
SLIER
SOLAR


Permit’s actually effort some of the words on this list equally the start gauge in a couple of existent Wordle games. In Figure 4 I evidence my Wordle results from Jan 22 (left) and January 23 (right).

Figure 4: LEFT — My Wordle (#217 from 22 Jan 2022) where I used the very offset discussion,
Alley, from my curated list of
21 Best Words to Use as Wordle Guesses. Correct — My Wordle (#218 from 23 Jan 2022) where I tried another give-and-take from my list,
Heighten, equally a first guess (Author’s screenshot of Wordle).

Guessing
AISLE
on Jan 22 helped me solve the Wordle in 3 tries! After the initial guess, I used the frequencies from Figure 2 above to assist guide my subsequent guesses. For example, I guessed
MINCE
before
WINCE
because
Chiliad
appears more frequently than
West. Amazingly, each correct letter of the alphabet I guessed was in its correct position, illustrating that the part of luck in this game cannot exist discounted. Guessing
RAISE
on Jan 23 also allowed me to solve the puzzle in 3 tries. I again referred to what I had learned near frequencies to help me select the side by side letters to guess.

Popular:   Samsung Galaxy Watch Smartwatch Wearables

At present equipped with some data-driven insights, go forth and Wordle!

The code I wrote for this analysis along with the bar charts higher up can exist found in my WORDLE-VISION GitHub repo.

Wordle Wordlebot Analytics Data Puzzle Game Efficiency

Source: https://towardsdatascience.com/wordle-vision-simple-analytics-to-up-your-wordle-game-65daf4f1aa6f