Machine learning programs have recently made huge advances. Stephen Marche tested one against Shakespeare’s collected works, to see if it could help him figure out which of the several versions of Hamlet’s soliloquy was most likely what the playwright intended.
In the First Quarto, sometimes called the “bad quarto,” the famous “To be, or not to be” speech begins this way:
To be, or not to be, ay there’s the point, To Die, to sleep, is that all? Aye all: No, to sleep, to dream, aye marry there it goes.
Nobody wants to believe that Shakespeare wrote this crap...
...Cohere works on an entirely different level. It doesn’t require identifying function words or phrases. It just converts language into logarithmic probabilities. You create a Shakespeare algorithm. You put in each of the three different versions of “To be, or not to be” and out pop numbers: -3.6788540925266906 for the First Quarto, -3.179199017199017 for the Second Quarto, and -3.4799767386091127 for the Folio. The closer the number is to zero, the more likely the model thinks the sequence is.












