Google Chirp: API Provider Benchmarking & Analysis
Analysis of Google Chirp API providers across performance metrics including Artificial Analysis Word Error Rate Index, speed, and price.
Highlights
Word Error Rate Index
AA-WER v2 · % of words transcribed incorrectly · Lower is better
Speed Factor
Input audio seconds transcribed per second · Higher is better
Price
USD per 1000 minutes of audio · Lower is better
Artificial Analysis Word Error Rate (AA-WER) Index by API
Artificial Analysis Word Error Rate (AA-WER) Index by API
% of words transcribed incorrectly · Lower is better · AA-WER v2 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%)
Note: For Earnings22, if a model cannot reliably handle full-length audio due to time limits, we chunk to ~9 minutes (relevant to: GPT-4o Transcribe, OpenAI; GPT-4o Mini Transcribe, OpenAI; Voxtral Mini Transcribe, Mistral; Nova 2 Pro, Amazon). For models with even shorter time limits, we chunk to ~30 seconds (relevant to: Canary Qwen 2.5B, NVIDIA).
API Benchmarks
Artificial Analysis Word Error Rate Index vs. Price
% of words transcribed incorrectly · Lower is better · AA-WER v2 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%) · USD per 1000 minutes of audio
Most attractive quadrant
Amazon Transcribe
Canary Qwen 2.5B, Replicate
Chirp
Chirp 2
Chirp 3
Enhanced
Gemini 3 Flash (High)
Gemini 3 Pro (High)
Gemini 3.1 Pro Preview (High)
Gemini 3.1 Pro Preview (Low)
GPT-4o Mini Transcribe
GPT-4o Transcribe
Nova 2 Pro
Nova-3
Rev AI
Scribe v2
Solaria-1
Soniox V4
Universal
Universal-3 Pro
Voxtral Mini Transcribe
Voxtral Mini Transcribe 2
Voxtral Small
Whisper (L, v3), fal.ai
Whisper (L, v3), Fireworks
Wizper (L, v3), fal.ai
Speed Factor
Speed Factor
Input audio seconds transcribed per second · Higher is better
Price
Price of Transcription
USD per 1000 minutes of audio · Lower is better
Summary of Key Metrics & Further Information
| Provider | Model | Whisper version | Footnotes | Word Error Rate (%) | Median Speed Factor | Price (USD per 1000 minutes) | Further Details |
|---|---|---|---|---|---|---|---|
| Whisper Large v2 | large-v2 | 4.2% | 29.1 | 6.00 | |||
| Wizper Large v3 | large-v3 | 4.8% | 210.4 | 0.50 | |||
| Incredibly Fast Whisper | large-v3 | 5.8% | 51.9 | 1.49 | |||
| Whisper Large v3 | large-v3 | 10.2% | 2.7 | 4.23 | |||
| Whisper Large v3 | large-v3 | 4.2% | 91.1 | 1.15 | |||
| Whisper Large v3 Turbo | v3 Turbo | 4.8% | 399.3 | 0.67 | |||
| Whisper Large v3 | large-v3 | 4.7% | 99.1 | 1.00 | |||
| Whisper Large v3 Turbo | v3 Turbo | 4.8% | 124.9 | 1.00 | |||
| Whisper Large v3 | large-v3 | 7.4% | 134.9 | 1.50 | |||
| Speechmatics Standard | 5.1% | 69.5 | 4.00 | ||||
| Speechmatics Enhanced | 4.1% | 53.7 | 6.70 | ||||
| Nova-2 | 5.4% | 518.4 | 4.30 | ||||
| Base | 10.8% | 565.0 | 12.50 | ||||
| Nova-3 | 5.3% | 138.5 | 4.30 | ||||
| Universal, AssemblyAI | 3.9% | 108.3 | 2.50 | ||||
| Universal-3 Pro | 3.3% | 99.0 | 3.50 | ||||
| Amazon Transcribe | 4.2% | 19.3 | 24.00 | ||||
| Rev AI | 6.0% | 12.8 | 20.00 | ||||
| Chirp | 31.3% | 13.9 | 16.00 | ||||
| Chirp 2, Google | 5.9% | 18.9 | 16.00 | ||||
| Chirp 3, Google | 4.5% | 29.5 | 16.00 | ||||
| Scribe v1 | 3.1% | 38.4 | 6.67 | ||||
| Scribe v2 | 2.2% | 21.3 | 6.67 | ||||
| Gemini 2.0 Flash | 3.9% | 53.4 | 1.40 | ||||
| Gemini 2.0 Flash Lite | 3.9% | 50.8 | 0.19 | ||||
| Gemini 2.5 Flash Lite | 5.3% | 67.5 | 6.56 | ||||
| Gemini 2.5 Flash | 5.2% | 66.2 | 6.66 | ||||
| Gemini 2.5 Pro | 3.0% | 11.8 | 11.39 | ||||
| Gemini 3 Pro (High) | 2.9% | 6.1 | 18.40 | ||||
| Gemini 3 Flash (High) | 3.1% | 15.3 | 13.70 | ||||
| Gemini 3.1 Pro Preview (High) | 2.9% | 6.4 | 18.15 | ||||
| Gemini 3.1 Flash-Lite Preview (Minimal) | 3.5% | 78.9 | 5.83 | ||||
| Gemini 3.1 Pro Preview (Low) | 3.8% | 6.8 | 7.72 | ||||
| GPT-4o Transcribe | 4.1% | 31.4 | 6.00 | ||||
| GPT-4o Mini Transcribe | 4.6% | 53.0 | 3.00 | ||||
| Parakeet RNNT 1.1B | 5.5% | 6.1 | 1.91 | ||||
| Parakeet TDT 0.6B V2, NVIDIA | 6.5% | 93.4 | 0.00 | ||||
| Canary Qwen 2.5B, NVIDIA | 4.3% | 5.7 | 0.74 | ||||
| Voxtral Mini Transcribe | 3.7% | 78.6 | 1.00 | ||||
| Voxtral Small | 2.9% | 68.8 | 4.00 | ||||
| Voxtral Mini | 4.0% | 81.4 | 1.00 | ||||
| Voxtral Mini Transcribe 2 | 3.7% | 75.3 | 3.00 | ||||
| Solaria-1, Gladia | 4.2% | 49.5 | 4.07 | ||||
| Nova 2 Omni | 5.8% | 35.9 | 1.85 | ||||
| Nova 2 Pro | 5.0% | 23.8 | 3.10 | ||||
| Soniox V4 | 4.0% | 32.7 | 1.66 | ||||
| Pulse STT | 4.5% | 171.8 | 8.00 | ||||
| Qwen3.5 Omni Flash | 13.6% | 91.4 | 0.00 | ||||
| Qwen3.5 Omni Plus | 3.7% | 96.5 | 0.00 |
Speech to Text providers compared: OpenAI, Speechmatics, fal.ai, Replicate, Deepgram, Groq, Fireworks, AssemblyAI, Amazon Bedrock, Rev AI, Google, ElevenLabs, Together.ai, Mistral, DeepInfra, NVIDIA, Gladia, Soniox, Smallest.ai, Alibaba Cloud.