Google Gemini: API Provider Benchmarking & Analysis
Analysis of Google Gemini API providers across performance metrics including Artificial Analysis Word Error Rate Index, speed, and price.
Highlights
Artificial Analysis Word Error Rate (AA-WER) Index by API
Artificial Analysis Word Error Rate (AA-WER) Index by API
% of words transcribed incorrectly · Lower is better · AA-WER v2 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%)
Note: For Earnings22, if a model cannot reliably handle full-length audio due to time limits, we chunk to ~9 minutes (relevant to: Nova 2 Pro, Amazon; Gemini 2.5 Lite, Google; Voxtral Mini Transcribe, Mistral; GPT-4o Transcribe, OpenAI; GPT-4o Mini Transcribe, OpenAI; Gemini 2.0 Flash Lite, Google). For models with even shorter time limits, we chunk to ~30 seconds (relevant to: Canary Qwen 2.5B, NVIDIA).
API Benchmarks
Artificial Analysis Word Error Rate Index vs. Price
% of words transcribed incorrectly · Lower is better · AA-WER v2 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%) · USD per 1000 minutes of audio
Most attractive quadrant
Amazon Transcribe
Canary Qwen 2.5B, Replicate
Enhanced
Gemini 2.0 Flash Lite
Gemini 2.5 Flash
Gemini 2.5 Flash Lite
Gemini 2.5 Pro
Gemini 3 Flash (High)
Gemini 3.1 Flash-Lite Preview (Minimal)
Gemini 3.1 Pro Preview (High)
Gemini 3.1 Pro Preview (Low)
GPT-4o Mini Transcribe
GPT-4o Transcribe
Grok Speech to Text, xAI
MAI-Transcribe-1
MAI-Transcribe-1.5
Nova 2 Pro
Nova-3
Parakeet TDT 0.6B V3, Togetherai
Rev AI
Scribe v2
Smallest AI Pulse
Smallest AI Pulse Pro
Solaria-1
Universal
Universal-3 Pro
Voxtral Mini Transcribe
Voxtral Mini Transcribe 2
Voxtral Small
Whisper (L, v3), fal.ai
Whisper (L, v3), Fireworks
Whisper Large v3, together.ai
Wizper (L, v3), fal.ai
Speed Factor
Speed Factor
Input audio seconds transcribed per second · Higher is better
Price
Price of Transcription
USD per 1000 minutes of audio · Lower is better
Summary of Key Metrics & Further Information
Provider | Further Details | ||||
|---|---|---|---|---|---|
Qwen3.5 Omni Flash | 13.5% | 79.8 | 0.00 | ||
Qwen3.5 Omni Plus | 3.5% | 93.7 | 0.00 | ||
Nova 2 Pro | 4.9% | 22.7 | 3.10 | ||
Amazon Transcribe | 4.1% | 19.3 | 24.00 | ||
Universal-3 Pro | 3.1% | 90.2 | 3.50 | ||
Universal, AssemblyAI | 3.8% | 112.5 | 2.50 | ||
MAI-Transcribe-1.5 | 2.4% | 269.6 | 6.00 | ||
MAI-Transcribe-1 | 2.6% | 55.6 | 6.00 | ||
Nova-3 | 5.2% | 472.0 | 4.30 | ||
Nova-2 | 5.3% | 428.6 | 4.30 | ||
Base | 10.7% | 330.3 | 12.50 | ||
Scribe v2 | 2.2% | 37.4 | 3.67 | ||
Scribe v1 | 3.0% | 39.8 | 6.67 | ||
Solaria-1, Gladia | 4.1% | 61.3 | 4.07 | ||
Gemini 3.1 Pro Preview (High) | 2.8% | 6.3 | 18.15 | ||
Gemini 3.1 Pro Preview (Low) | 3.6% | 7.1 | 7.72 | ||
Gemini 3 Flash (High) | 2.9% | 18.2 | 13.70 | ||
Gemini 2.5 Flash Lite | 5.2% | 70.3 | 6.56 | ||
Gemini 2.5 Flash | 5.1% | 62.2 | 6.66 | ||
Gemini 2.5 Pro | 2.9% | 13.7 | 11.39 | ||
Gemini 2.0 Flash Lite | 3.8% | 56.4 | 0.19 | ||
Gemini 3.1 Flash-Lite Preview (Minimal) | 3.4% | 70.6 | 5.83 | ||
Gradium Speech-to-Text | 8.4% | 2.3 | 13.00 | ||
Grok Speech to Text, xAI | 4.0% | 102.6 | 1.67 | ||
Voxtral Mini Transcribe 2 | 3.6% | 70.7 | 3.00 | ||
Voxtral Mini Transcribe | 3.5% | 61.3 | 2.00 | ||
Voxtral Small | 2.8% | 67.1 | 4.00 | ||
Voxtral Mini | 3.8% | 70.2 | 1.00 | ||
Modulate STT Batch English VFast | 13.0% | 193.6 | 0.42 | ||
Parakeet TDT 0.6B V3, Togetherai | 4.5% | 919.9 | 1.50 | ||
Canary Qwen 2.5B, NVIDIA | 4.3% | 5.3 | 0.74 | ||
Parakeet TDT 0.6B V2, NVIDIA | 6.4% | 101.9 | 0.00 | ||
Parakeet RNNT 1.1B | 5.4% | 5.9 | 1.91 | ||
GPT-4o Transcribe | 4.0% | 31.7 | 6.00 | ||
GPT-4o Mini Transcribe | 4.5% | 46.0 | 3.00 | ||
Smallest AI Pulse Pro | 3.3% | 233.9 | 5.00 | ||
Rev AI | 5.9% | 12.6 | 3.33 | ||
Smallest AI Pulse | 4.4% | 138.9 | 5.00 | ||
Speechmatics Standard | 5.1% | 68.1 | 4.00 | ||
Speechmatics Enhanced | 4.0% | 58.2 | 6.70 | ||
Whisper Large v3 Turbo | 4.6% | 108.4 | 0.67 | ||
Whisper Large v3 Turbo | 4.7% | 213.0 | 1.00 | ||
Wizper Large v3 | 4.7% | 232.0 | 0.50 | ||
Incredibly Fast Whisper | 5.7% | 56.5 | 1.49 | ||
Whisper Large v3 | 10.1% | 2.8 | 4.23 | ||
Whisper Large v3 | 4.1% | 79.7 | 1.15 | ||
Whisper Large v3 | 4.6% | 300.3 | 1.00 | ||
Whisper Large v3 | 4.5% | 393.9 | 1.50 | ||
Whisper Large v2 | 4.1% | 27.1 | 6.00 |
Speech to Text providers compared: Google.