
Speechmatics: API Provider Benchmarking & Analysis
Highlights
Artificial Analysis Word Error Rate (AA-WER) Index by API
Artificial Analysis Word Error Rate (AA-WER) Index by API
Artificial Analysis Word Error Rate (AA-WER) Index: Measures transcription accuracy across 3 datasets to evaluate models in real-world speech with diverse accents, domain-specific language, and challenging channel & acoustic conditions.
AA-WER is calculated as an audio-duration-weighted average of WER across ~2 hours from three datasets: VoxPopuli, Earnings-22, and AMI-SDM. See methodology for more detail.
Artificial Analysis Word Error Rate (AA-WER) Index by Individual Dataset
Artificial Analysis Word Error Rate (AA-WER) Index: Measures transcription accuracy across 3 datasets to evaluate models in real-world speech with diverse accents, domain-specific language, and challenging channel & acoustic conditions.
AA-WER is calculated as an audio-duration-weighted average of WER across ~2 hours from three datasets: VoxPopuli, Earnings-22, and AMI-SDM. See methodology for more detail.
Artificial Analysis Word Error Rate (AA-WER) Index vs Other Metrics
Artificial Analysis Word Error Rate Index vs. Price
Artificial Analysis Word Error Rate (AA-WER) Index: Measures transcription accuracy across 3 datasets to evaluate models in real-world speech with diverse accents, domain-specific language, and challenging channel & acoustic conditions.
AA-WER is calculated as an audio-duration-weighted average of WER across ~2 hours from three datasets: VoxPopuli, Earnings-22, and AMI-SDM. See methodology for more detail.
Artificial Analysis Word Error Rate Index vs. Speed Factor
Artificial Analysis Word Error Rate (AA-WER) Index: Measures transcription accuracy across 3 datasets to evaluate models in real-world speech with diverse accents, domain-specific language, and challenging channel & acoustic conditions.
AA-WER is calculated as an audio-duration-weighted average of WER across ~2 hours from three datasets: VoxPopuli, Earnings-22, and AMI-SDM. See methodology for more detail.
Speed Factor
Speed Factor
Speed Factor Variance

Speed Factor, Over Time
Speed Factor vs. Price
Price
Price of Transcription
Note: Groq chargers for a minimum of 10s per request.
| Provider | Model | Whisper version | Footnotes | Word Error Rate (%) | Median Speed Factor | Price (USD per 1000 minutes) | Further Details |
|---|---|---|---|---|---|---|---|
| Whisper Large v2 | large-v2 | 15.8% | 28.1 | 6.00 | |||
| Whisper Large v2 | large-v2 | 27.2% | 34.2 | 6.00 | |||
| Whisper Large v3 | large-v3 | 16.8% | 291.9 | 0.50 | |||
| Incredibly Fast Whisper | large-v3 | 18.2% | 63.1 | 1.49 | |||
| Whisper Large v2 | large-v2 | 15.8% | 2.4 | 3.47 | |||
| Whisper Large v3 | large-v3 | 24.6% | 2.8 | 4.23 | |||
| WhisperX | large-v3 | 16.3% | 18.6 | 1.09 | |||
| Whisper (M) | medium | 2.68 | |||||
| Whisper (S) | small | 1.37 | |||||
| Whisper Large v3 | large-v3 | 16.8% | 225.4 | 1.85 | |||
| Distil-Whisper | 0.33 | ||||||
| Whisper Large v3 | large-v3 | 16.8% | 106.9 | 0.45 | |||
| Whisper Large v3 | large-v3 | 16.8% | 138.1 | 1.15 | |||
| Whisper Large v3 Turbo | v3 Turbo | 268.3 | 0.67 | ||||
| Whisper Large v3 | large-v3 | 461.3 | 1.00 | ||||
| Whisper Large v3 Turbo | v3 Turbo | 17.8% | 415.3 | 1.00 | |||
| Whisper-Large-v3 | large-v3 | 16.8% | 52.5 | 1.67 | |||
| Whisper Large v3 | large-v3 | 24.6% | 123.5 | 1.50 | |||
| Speechmatics Standard | 16.0% | 17.8 | 4.00 | ||||
| Speechmatics Enhanced | 14.4% | 17.8 | 6.70 | ||||
| Azure AI Speech Service | 17.2% | 2.0 | 16.67 | ||||
| Nano | 16.3% | 85.3 | 2.00 | ||||
| Universal 2, AssemblyAI | 14.5% | 85.8 | 2.50 | ||||
| Slam-1 | 15.2% | 59.3 | 4.50 | ||||
| Nova-2 | 17.3% | 459.4 | 4.30 | ||||
| Base | 21.9% | 491.9 | 12.50 | ||||
| Nova-3 | 18.3% | 612.2 | 4.30 | ||||
| Gladia v2 | whisper-v2-variant | 16.7% | 44.4 | 10.20 | |||
| Amazon Transcribe | 14.0% | 19.3 | 24.00 | ||||
| Fish Speech to Text | 26.9 | 0.00 | |||||
| Rev AI | 15.2% | 20.00 | |||||
| Chirp | 16.9% | 13.9 | 16.00 | ||||
| Chirp 2, Google | 11.6% | 17.8 | 16.00 | ||||
| Chirp 3, Google | 15.0% | 31.0 | 16.00 | ||||
| Scribe, ElevenLabs | 45.5 | 6.67 | |||||
| Gemini 2.0 Flash | 17.9% | 55.8 | 1.40 | ||||
| Gemini 2.0 Flash Lite | 16.6% | 58.3 | 0.19 | ||||
| Gemini 2.5 Flash Lite | 16.1% | 84.7 | 0.58 | ||||
| Gemini 2.5 Flash | 19.2% | 60.4 | 1.92 | ||||
| Gemini 2.5 Pro | 15.0% | 10.4 | 0.00 | ||||
| GPT-4o Transcribe | 21.3% | 27.8 | 6.00 | ||||
| GPT-4o Mini Transcribe | 20.1% | 31.9 | 3.00 | ||||
| Granite Speech 3.3 8B, IBM | 15.7% | 0.00 | |||||
| Parakeet RNNT 1.1B | 6.4 | 1.91 | |||||
| Parakeet TDT 0.6B V2, NVIDIA | 62.5 | 0.00 | |||||
| Canary Qwen 2.5B, NVIDIA | 13.2% | 4.2 | 0.00 | ||||
| Voxtral Mini | 15.8% | 59.2 | 1.00 | ||||
| Voxtral Small | 14.7% | 69.8 | 4.00 | ||||
| Voxtral Small | 14.7% | 16.3 | 3.00 | ||||
| Voxtral Mini | 15.8% | 52.6 | 1.00 | ||||
| Qwen3 ASR Flash | 15.0% | 1.92 | |||||
| Qwen3 Omni | 52.3% | 0.00 | |||||
| Qwen3 Omni Captioner | 5.72 |
Speech to Text providers compared: OpenAI, Speechmatics, Microsoft Azure, AssemblyAI, fal.ai, Replicate, Deepgram, Gladia, Groq, Deepinfra, Fireworks, Amazon Bedrock, Fish Audio, Rev AI, Google, ElevenLabs, SambaNova, IBM, Together.ai, Mistral, NVIDIA, and Alibaba Cloud.