Speech to Text AI Model & Provider Leaderboard
Compare word error rate, speed, and pricing across Speech to Text models and providers.
For further details, see our methodology page.
You may also be interested in...
Highlights
AA-WER Streaming Index vs. Time to Final Transcription
AA-WER Streaming Index vs. Time to Final Transcription
AA-WER Streaming Index - Final Transcription
AA-WER Streaming - Final Transcription: AA-AgentTalk Dataset
AA-WER Streaming Index vs. Time to First Partial Transcription After Speech End
AA-WER Streaming Index vs. Time to First Partial Transcription After Speech End
AA-WER Streaming Index - First Partial Transcription After Speech End
AA-WER Streaming - First Partial Transcription After Speech End: AA-AgentTalk Dataset
AA-WER Streaming - Final Transcription compared to First Partial Transcription After Speech End
Latency
Time to Final Transcription
Time to First Partial Transcription After Speech End: Latency to First Partial Transcription After Speech End
Price
Price of Transcription
Speech to Text Streaming models compared: AssemblyAI U3 Realtime Pro, Cartesia Ink, ElevenLabs Scribe v2 Realtime, Gladia Solaria 1 Realtime, Deepgram Flux, Speechmatics Realtime Enhanced, Nemotron 3 ASR 80ms, Nemotron 3 ASR 160ms, Nemotron 3 ASR 560ms, Nemotron 3 ASR 1120ms, Soniox Realtime, Inworld STT 1 Realtime, OpenAI GPT Realtime, Chirp 3 Streaming, Voxtral Mini Transcribe Realtime, RevAI Streaming, Deepgram Nova-3 Realtime, Pulse STT Realtime, Gradium STT Realtime, Qwen3 ASR Flash Realtime, Amazon Transcribe Streaming, Azure STT Real-time Transcription, Cartesia Ink-2 (external endpoints), Ink-2 Turn Detection Eager End, Cartesia Ink-2 (semantic endpoints).