Google Gemini: API Provider Benchmarking & Analysis
Analysis of Google Gemini API providers across performance metrics including Artificial Analysis Word Error Rate Index, speed, and price.
Word Error Rate Index
AA-WER v2.0; % of words transcribed incorrectly; lower is better.
Speed Factor
Input audio seconds transcribed per second; Higher is better
Price
USD per 1000 minutes of audio; Lower is better
Artificial Analysis Word Error Rate (AA-WER) Index by API
Artificial Analysis Word Error Rate (AA-WER) Index by API
% of words transcribed incorrectly; lower is better. AA-WER v2.0 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%)
Note: For Earnings22, if a model cannot reliably handle full-length audio due to time limits, we chunk to ~9 minutes (relevant to: Gemini 2.0 Flash Lite, Google; GPT-4o Transcribe, OpenAI; Gemini 2.5 Lite, Google).
API Benchmarks
Artificial Analysis Word Error Rate Index vs. Price
% of words transcribed incorrectly; lower is better. AA-WER v2.0 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%), USD per 1000 minutes of audio
Most attractive quadrant
Enhanced
Gemini 2.0 Flash
Gemini 2.0 Flash Lite
Gemini 2.5 Flash
Gemini 2.5 Flash Lite
Gemini 2.5 Pro
Gemini 3 Flash
Gemini 3 Pro
GPT-4o Transcribe
Scribe v1
Scribe v2
Solaria-1
Universal
Universal-3 Pro
Voxtral Small
Whisper (L, v3), Fireworks
Speed Factor
Speed Factor
Input audio seconds transcribed per second, Higher is better
Price
Price of Transcription
USD per 1000 minutes of audio, Lower is better
Summary of Key Metrics & Further Information
| Provider | Model | Whisper version | Footnotes | Word Error Rate (%) | Median Speed Factor | Price (USD per 1000 minutes) | Further Details |
|---|---|---|---|---|---|---|---|
| Whisper Large v2 | large-v2 | 4.2% | 29.6 | 6.00 | |||
| Wizper Large v3 | large-v3 | 4.9% | 232.1 | 0.50 | |||
| Incredibly Fast Whisper | large-v3 | 5.8% | 56.9 | 1.49 | |||
| Whisper Large v3 | large-v3 | 10.2% | 3.0 | 4.23 | |||
| Whisper Large v3 | large-v3 | 4.3% | 61.4 | 1.15 | |||
| Whisper Large v3 Turbo | v3 Turbo | 4.8% | 387.8 | 0.67 | |||
| Whisper Large v3 | large-v3 | 4.8% | 203.9 | 1.00 | |||
| Whisper Large v3 Turbo | v3 Turbo | 4.8% | 284.3 | 1.00 | |||
| Whisper Large v3 | large-v3 | 7.4% | 109.3 | 1.50 | |||
| Speechmatics Standard | 5.3% | 44.5 | 4.00 | ||||
| Speechmatics Enhanced | 4.3% | 43.7 | 6.70 | ||||
| Nova-2 | 5.6% | 468.4 | 4.30 | ||||
| Base | 10.9% | 523.0 | 12.50 | ||||
| Nova-3 | 6.5% | 251.1 | 4.30 | ||||
| Universal, AssemblyAI | 4.0% | 115.5 | 2.50 | ||||
| Slam-1 | 4.1% | 86.4 | 4.50 | ||||
| Universal-3 Pro | 3.3% | 77.5 | 3.50 | ||||
| Amazon Transcribe | 4.3% | 17.8 | 24.00 | ||||
| Chirp | 31.3% | 14.7 | 16.00 | ||||
| Chirp 2, Google | 6.0% | 18.3 | 16.00 | ||||
| Chirp 3, Google | 4.6% | 22.1 | 16.00 | ||||
| Scribe v1 | 3.2% | 36.2 | 6.67 | ||||
| Scribe v2 | 2.3% | 31.0 | 6.67 | ||||
| Gemini 2.0 Flash | 4.0% | 50.0 | 1.40 | ||||
| Gemini 2.0 Flash Lite | 4.0% | 49.6 | 0.19 | ||||
| Gemini 2.5 Flash Lite | 5.3% | 68.5 | 0.58 | ||||
| Gemini 2.5 Flash | 5.3% | 53.4 | 1.92 | ||||
| Gemini 2.5 Pro | 3.1% | 13.0 | 4.80 | ||||
| Gemini 3 Pro | 2.9% | 5.8 | 7.68 | ||||
| Gemini 3 Flash | 3.1% | 14.9 | 1.92 | ||||
| GPT-4o Transcribe | 4.1% | 34.1 | 6.00 | ||||
| GPT-4o Mini Transcribe | 4.6% | 51.1 | 3.00 | ||||
| Parakeet RNNT 1.1B | 5.0% | 6.0 | 1.91 | ||||
| Parakeet TDT 0.6B V2, NVIDIA | 6.8% | 93.6 | 0.00 | ||||
| Canary Qwen 2.5B, NVIDIA | 4.4% | 5.7 | 0.74 | ||||
| Voxtral Mini | 3.7% | 72.8 | 1.00 | ||||
| Voxtral Small | 3.0% | 67.8 | 4.00 | ||||
| Voxtral Mini | 4.0% | 83.8 | 1.00 | ||||
| Solaria-1, Gladia | 4.2% | 51.2 | 8.33 | ||||
| Nova 2 Omni | 5.9% | 35.0 | 1.85 | ||||
| Nova 2 Pro | 5.0% | 23.0 | 3.10 |
Speech to Text providers compared: OpenAI, Speechmatics, fal.ai, Replicate, Deepgram, Groq, Fireworks, AssemblyAI, Amazon Bedrock, Google, ElevenLabs, Together.ai, Mistral, DeepInfra, NVIDIA, and Gladia.