Transcribing speech is never neutral. It shapes power and bias
SOURCE: THECONVERSATION.COM
MAY 08, 2026
Published: May 6, 2026 9.19pm BST
Share article
Print article
Earlier this year I gave a talk about my research at Oxford’s All Souls College, and worked with a chef to design an accompanying menu.
Thinking about my work in southwest Western Australia, I typed “Boorloo”, the Nyungar name for the City of Perth.
Autocorrect had other ideas. It replaced it with “Barolo” – which, I thought, made for a fitting wine choice on the night.
It was an amusing moment, but also a revealing one. The system’s dictionary, trained largely on mainstream English data, didn’t know what Boorloo was, so it reached for a more familiar alternative. This seemingly minor miscorrection offers a glimpse into how language technologies are shaped – including which words they recognise, and which they overlook.
Part of the answer is that technologies such as automatic speech recognition convert spoken language into text. Transcription is often presented as a straightforward technical exercise: you listen, you write down what was said.
But every transcription protocol carries within it assumptions about what standardised speech looks like. In the words of linguist Mary Bucholtz, “all transcripts take sides”.
Understand how AI is changing society, with our weekly newsletter
Subscribe for free
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
In practice, the standardised language is almost always the “prestige dialect” of powerful institutions. For English, that may be the variety used in the Oxford English Dictionary or by the BBC.
Recent research from Cornell University and Carnegie Mellon shows what this means in concrete terms.
When people watched a video presentation with automatically generated, error-prone subtitles, they consistently rated the speaker as less clear and less knowledgeable than viewers who saw the same presentation with accurate captions. The quality of the transcription affected not only how viewers perceived the speaker, but also the content of the talk.
The stakes are particularly high for First Nations people in Australia. Here, the mismatch between the conventions of transcription and the actual practice of communication can be severe.
In many Indigenous communities, pauses and silences themselves function as meaningful acts of communication.
In places such as Wadeye in Australia’s Northern Territory, a sustained silence is not a gap to be filled. Instead it is part of the structure of what is being communicated.
Transcription systems developed in northern hemisphere academic contexts will generally render those silences with hesitation markers, ellipses, or editorial cuts, stripping out meaning.
Common words in languages other than English (such as “Boorloo” for Perth) go unrecognised. They may be mistranscribed to fit the language models on which technology is trained.
In legal, medical and welfare contexts, transcription can determine someone’s liberty, diagnosis, or entitlements. Here, systematic misrepresentation of non-standardised language is a justice issue.
Tools using artificial intelligence (AI) for transcription are now being deployed in hospitals and GP practices across Australia, resulting in mistakes, omissions and so-called hallucinations. A recent study of several AI scribes found all of them made errors in transcription and note-taking.
About half of the samples also included factual inaccuracies, with hallucinations occurring frequently, fabricating diagnoses, or listing medications that were never taken. In one case, a male patient was even recorded as being on the contraceptive pill.
Making things better includes developing more diverse models for automated speech recognition.
But for anyone producing transcripts right now – in journalism, oral history, the law, clinical records, or sociolinguistic research – certain obligations apply. Make your conventions explicit, acknowledge what your system cannot represent, and resist the impulse to normalise speech into something legible to an imagined standard reader.
Rendering speech into writing may seem natural, but writing is itself a technology. The task is not to achieve perfect objectivity, but to be visible and accountable for decisions about what is included and excluded, and how those decisions are made.
Associate Professor, Chair of Linguistics and Director of Language Lab, The University of Western Australia
Celeste Rodriguez Louro receives funding from the Australian Research Council and Google.

University of Western Australia provides funding as a founding partner of The Conversation AU.
https://doi.org/10.64628/AA.j4r737t6k
We believe in the free flow of information
Republish our articles for free, online or in print, under Creative Commons licence.
Republish this article
Before you go …
90,000 experts have written for The Conversation. Because our only agenda is to rebuild trust and serve the public by making knowledge available to everyone rather than a select few. Now, you can receive a curated list of articles in your inbox twice a week. Give it a go?
Get our newsletter

Jo Adetunji
Editor, The Conversation UK
LATEST NEWS
WHAT'S TRENDING
Data Science
5 Imaginative Data Science Projects That Can Make Your Portfolio Stand Out
OCT 05, 2022
SOURCE: MARKTECHPOST.COM
JUN 06, 2026
SOURCE: NETNEWSLEDGER.COM
MAY 23, 2026
SOURCE: BESTMEDIAINFO.COM
MAY 08, 2026
SOURCE: MARKTECHPOST.COM
MAY 02, 2026
SOURCE: MARKTECHPOST.COM
APR 19, 2026