Speech to text isn’t the answer to writing the next 'War and Peace'

SEP 12, 2021

Illustration by Suneesh K.

The speech recognition software has to reckon with the accent, the pitch, the speed with which we speak, the lilt and the pauses or “This is the feature I have Wanted all my life as a journalist as well as a writer. What great facility it offers. You just speak your mind and there it is in print without your having to move a little finger. I wish I had discovered this 35 years ago and I am manfully and painfully type Tawaif on the first electronic typewriters in the market at my little office in Nehru Place in New Delhi. Of course there's a good reason why I didn't and the reason is simple Google just had and invented the damn thing at that point of time.”

OK, some of that is gibberish I know and the part about a Tawaif is scandalous. But don’t blame me. I just used the speech-to-text feature in Google docs and this is the result. I know it isn’t the fault of the speech recognition system but my peculiar way of pronouncing some words that has the system stumped. After all, when your language suite includes English, Bengali and Punjabi, with Hindi dangling somewhere in between, your whatever could well sound like Tawaif.

For the most part, though, it is fairly accurate. Indeed, when you are an aspiring writer with a problem - the words flow when you think, but dry up the moment you sit down to type - speech to text should be a life-saver.

Only it isn’t. For one, all those errors that creep in on account of my inability to get the word to sound just right, are an eyesore when I stop to review the text. And then I am painfully fixing all the mistakes. By the time I resume my dictation, the tap has run dry and instead of the gush, there’s a sorry trickle. I do have the option of finishing an entire chapter and then going back to review it, but the problem is I have no idea about what the particular word was or what I actually said. Thus in the example above, I have no idea now what I actually said (or meant) that’s got converted to little. My first office was certainly not little. Particularly frustrating is when you have just that right word for a moment, but since it is used rarely in the context you are sketching, the software won’t just recognize it.

Maybe it works to compose a short message or even a passage, but when it comes to using it for longer forms of writing, it does have some limitations. Which is understandable given the history of speech recognition.

According to Google’ history of its efforts to turn speech recognition into a reality, the original algorithms to interpret voice input into text were trained from models based on speech patterns from GOOG-411, a quirky experimental speech recognition product launched in 2007 to look up phone numbers in the US or Canada. To quote Google’s blog on the subject: “The recognition system comprises the acoustic model, the pronunciation model, and the language model. While all three are trained separately, they are eventually composed into one gigantic search graph. Essentially, speech recognition is taking an audio waveform, pushing it through this search graph, and letting it find the path of least resistance—that is, finding the word sequence that has the maximum likelihood.”

And therein lies the problem. The whole thing is based on empirical data.

With Indians, the problem is magnified by the various accents which we use making it difficult to decipher the English we speak. The speech recognition software has to also reckon with the pitch, the speed with which we speak, the lilt and the pauses or their absence. The garbled end result is understandable.

To be fair, I haven’t used most of the apps that are available in the market. My experiences are restricted to the Google and Microsoft versions. While both are decent, I am back to using two fingers for this piece while my search continues for the equivalent of the steno-typists of yore, who could figure out exactly what their bosses were saying, type like greased lightning, make corrections on the fly and even offer suggestions for improvements as they went along.

And long before Pitman’s shorthand made that possible, Milton composed verses of Paradise Lost and memorized them before dictating them to friends and assistants who transcribed them quite accurately.