Google Assistant working on ‘Personalized speech recognition’

Artificial Intelligence Machine Learning ChatGPT Medicine Digital Twin Gaming Augmented Reality Quantum Computing

SOURCE: 9TO5GOOGLE.COM
MAY 27, 2022

In March of 2021, Google started using federated learning on Android to improve “Hey Google” hotword accuracy. An upcoming “Personalized speech recognition” feature now looks to help Google Assistant get “better at recognizing your frequent words and names.”

About APK Insight: In this “APK Insight” post, we’ve decompiled the latest version of an application that Google uploaded to the Play Store. When we decompile these files (called APKs, in the case of Android apps), we’re able to see various lines of code within that hint at possible future features. Keep in mind that Google may or may not ever ship these features, and our interpretation of what they are may be imperfect. We’ll try to enable those that are closer to being finished, however, to show you how they’ll look in case that they do ship. With that in mind, read on.

According to strings in recent versions of the Google app on Android, “Personalized speech recognition” will appear in Google Assistant settings with the following description:

Store audio recordings on this device to help Google Assistant get better at recognizing what you say. Audio stays on this device and can be deleted any time by turning off personalized speech recognition. Learn more

That “Learn more” might link to an existing support article about Google’s use of federated learning to improve hotword activations by similarly using “voice recordings stored on users’ devices to refine models like “Hey Google” detection:

It learns how to adjust the model from the voice data, and sends a summary of the model changes to Google servers. These summaries are aggregated across many users to provide a better model for everyone.

The upcoming feature looks to expand those machine learning-based improvements beyond “Hey Google” to your actual Assistant commands, especially those with names (e.g., using your voice to message contacts) and frequently spoken words. Audio of past utterances will be stored on-device and analyzed to make transcriptions more accurate in the future.

On devices like the 2nd-gen Nest Hub and Mini, Google already uses a machine learning chip that locally processes your most common queries “for a much faster response time.” That concept might now be expanding beyond smart home devices to Android.

Given Google’s stance toward Assistant and voice privacy, this will likely be an opt-in feature, like Assistant settings > “Help Improve Assistant” is today. From the description available today, “audio stays on this device” and is deleted upon disabling the capability. Meanwhile, when you turn off Personalized speech recognition, Google warns that:

If you turn this off, your Assistant will be less accurate at recognizing names and other words that you say frequently. All audio used to improve speech recognition for you will be deleted from this device.

It’s not clear when this capability will launch, or how much of an improvement it will have. This comes as Google at I/O 2022 previewed how conversations with Assistant next year will become more natural. Assistant will essentially ignore – and even verbally acknowledge – “umm,” interruptions, natural pauses, and other self-corrections. This is in comparison to the Assistant today taking what you said verbatim and issuing a response.

More on Google Assistant:

Thanks to JEB Decompiler, from which some APK Insight teardowns benefit.

Dylan Roussel and Kyle Bradshaw contributed to this article.

FTC: We use income earning auto affiliate links. More.

More on Google Assistant:

About the Author

Abner Li

White Castle to deploy voice-enabled digital signage in US

AI-equipped eyeglasses read silent speech

Top 5 Speech Recognition Data Collection Methods in 2023

Speech Recognition: Everything You Need to Know in 2023

How researchers are using speech to detect dementia

English-learning startup ELSA launches Speech Analyzer to help people gain conversational confidence

Why Bother with Call Transcription??

IIT Bombay software to enable live translations in regional languages inside classrooms