Meta gives away a free video dataset of 846 hours

Artificial Intelligence Machine Learning ChatGPT Medicine Digital Twin Gaming Augmented Reality Quantum Computing

SOURCE: ANALYTICSINDIAMAG.COM
FEB 04, 2022

On February 1, Meta announced a new resource to advance fairness in speech recognition: The company’s AI team released a research paper on a new project called ‘Casual Conversations’, an exhaustive data set with manual transcriptions to help researchers evaluate the accuracy of audio models.

Machine Learning models are as good as their data. When the models tend to only recognise the voice patterns of a white person, and appear to neglect a certain community, race or gender, it indicates a knowledge gap that seems unfair. In the context of ML, fairness refers to the attempts made to correct these biases in the statistical data. Ideally, data should be sensitised and equally representative of communities regardless of disability, ethnicity and gender.

The research on Automatic Speech Recognition (ASR) systems pale in comparison to the studies in the area of facial recognition.

Past studies

According to a Stanford study in 2020, the speech recognition systems of the biggest tech companies like Amazon, Apple, Google, Microsoft and IBM failed to identify 19% of the words when the user was white and 35% when the user was Black. However, only two companies responded to the study. While Amazon said it was constantly improving its speech recognition service, Google acknowledged the inefficiencies and claimed it has been taking a long, hard look at the model flaws.

In 2014, Google researchers wrote a paper detailing the reason behind the biases in language. Titled, ‘Discriminative Pronunciation Modelling for Dialectical Speech Recognition,’ the paper spoke about how African American Vernacular English (AAVE), a dialect mostly used by African Americans in casual speech is different from Standard American English (SAE) in terms of pronunciation and vocabulary. The accuracy of an ASR system dropped around a specific dialect due to the lack of representation in training data.

Diverse dataset

The Casual Conversations dataset comprises 846 hours of 45,000 videos, each up to a minute long on average. The conversations of more than 3,000 participants from different ages, ethnicities and gender on random subjects went into the datasets. In addition, researchers did a taxonomy of the collected speech based on the skin tones. While skin colour is a more important variable in computer vision, the skin tone of the participant could be interrelated to variables in speech.

The researchers made speech recognition models including a LibriSpeech model, a supervised Video model, a semi-supervised Video model and a semi-supervised teacher Video model. The results showed big accuracy gaps in terms of gender but not across age groups. As it turned out, skin colour was an important factor in driving different performances among subgroups. The more varied and larger the dataset, the lesser the comparative error rates of the ASR model, the study concluded. The dataset must represent a diverse range of attributes from subgroups to achieve more evenly distributed accuracies.

Prospects

Last October, Speechmatics, a UK-based speech recognition company, said its speech recognition system had an accuracy of 83% for African American users. Speechmatics beat Microsoft (73%) Amazon and Google (69% each), IBM (62%) and Apple (55%) hands down in accuracy levels. The company’s model failed to recognise 17% of the words spoken by Black voices compared to Amazon and Google’s 31%.

Speechmatics said it had trained its ML models on reams of unlabelled data from podcasts and social media to expose the software to different accents, styles and grammar. “It would be good if people were open-sourcing test sets that let you evaluate how well you’re doing on this front,” Will Williams, the company’s vice-president of ML, said.

Past studies

Diverse dataset

Prospects

White Castle to deploy voice-enabled digital signage in US

AI-equipped eyeglasses read silent speech

Top 5 Speech Recognition Data Collection Methods in 2023

Speech Recognition: Everything You Need to Know in 2023

How researchers are using speech to detect dementia

English-learning startup ELSA launches Speech Analyzer to help people gain conversational confidence

Why Bother with Call Transcription??

IIT Bombay software to enable live translations in regional languages inside classrooms