AI startup ElevenLabs launches Scribe model that converts voice to text and supports Ukrainian language with "excellent accuracy"


SOURCE: DEV.UA
FEB 28, 2025

ElevenLabs, an AI startup valued at $3.3 billion whose product was used to dub President Volodymyr Zelenskyy’s interview with US blogger Lex Friedman, has launched a new standalone model, Scribe, that supports Ukrainian, one of the languages ??with the lowest error rates.

As TechCrunch reports, ElevenLabs' Scribe model supports over 99 languages ??at launch. The company classifies over 25 languages ??as having «excellent accuracy» for the model, with a word error rate of less than 5%. This list includes English, Ukrainian, French, German, Hindi, Indonesian, Japanese, Polish, Portuguese, Spanish, Vietnamese, and others.

Other languages ??are divided into different categories:

  • with high accuracy — from 5% to 10% of errors in words;
  • good accuracy — from 10% to 20% of errors in words;
  • moderate accuracy — from 25% to 50% of errors in words.

The company said the model outperformed Google Gemini 2.0 Flash and Whisper Large V3 in FLEURS and Common Voice tests in various languages.

ElevenLabs developed a speech-to-text component for its AI conversational agent platform, which was released last year, but this is the first time the company has released a separate speech recognition model.

«We want to better understand what you’re saying in a conversation. We’re working to move beyond just generating content and into understanding and transcribing speech. Many people say that converting speech to text is a solved problem. But for many languages, it’s very bad. We believe we can build better speech recognition models because we have internal teams that annotate the data and give us quick feedback,» said CEO Mati Staniszewski.

The model also features intelligent speaker dialogization to tell the user who is speaking, word-level timestamping for accurate captioning, and automatic tagging of audio events such as audience laughter. The startup gives customers the ability to directly transcribe video content for subtitles or captioning in their studio.

Currently, Scribe only works with pre-recorded audio formats. The company says it will soon release a low-latency, real-time version of the model. This means it’s not yet effective for transcribing meetings or voice notes.

Scribe costs $0.40 per hour of transcribed audio. While that price is competitive, some of its competitors offer lower prices for audio transcription with some feature differentiation, TechCrunch notes.

Recall that in 2023, the startup ElevenLabs, which creates a universal machine for dubbing with artificial intelligence, added support for more than 20 languages. Among them were Ukrainian, Polish, Hindi, Portuguese, Spanish, Japanese, and Arabic.

In late January 2025, ElevenLabs raised $180 million in a new funding round and tripled its valuation to $3.3 billion. The Series C funding round was co-led by Andreessen Horowitz and Iconiq Growth with additional new investors NEA, World Innovation Lab, Valor, Endeavor Catalyst Fund, and Lunate.