Google Gemini: API Provider Benchmarking & Analysis
Analysis of Google Gemini API providers across performance metrics including Artificial Analysis Word Error Rate Index, speed, and price.
Highlights
Artificial Analysis Word Error Rate (AA-WER) Index by API
Artificial Analysis Word Error Rate (AA-WER) Index by API
% of words transcribed incorrectly · Lower is better · AA-WER v2 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%)
Note: For Earnings22, if a model cannot reliably handle full-length audio due to time limits, we chunk to ~9 minutes (relevant to: Nova 2 Pro, Amazon; Gemini 2.5 Lite, Google; Voxtral Mini Transcribe, Mistral; GPT-4o Transcribe, OpenAI; GPT-4o Mini Transcribe, OpenAI; Gemini 2.0 Flash Lite, Google). For models with even shorter time limits, we chunk to ~30 seconds (relevant to: Canary Qwen 2.5B, NVIDIA).
API Benchmarks
Artificial Analysis Word Error Rate Index vs. Price
% of words transcribed incorrectly · Lower is better · AA-WER v2 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%) · USD per 1000 minutes of audio
Most attractive quadrant
Amazon Transcribe
Canary Qwen 2.5B, Replicate
Enhanced
Gemini 2.0 Flash
Gemini 2.0 Flash Lite
Gemini 2.5 Flash
Gemini 2.5 Flash Lite
Gemini 2.5 Pro
Gemini 3 Flash (High)
Gemini 3 Pro (High)
Gemini 3.1 Flash-Lite Preview (Minimal)
Gemini 3.1 Pro Preview (High)
Gemini 3.1 Pro Preview (Low)
GPT-4o Mini Transcribe
GPT-4o Transcribe
Nova 2 Pro
Nova-3
Parakeet TDT 0.6B V3, Togetherai
Pulse STT
Rev AI
Scribe v2
Solaria-1
Soniox V4
Universal
Universal-3 Pro
Voxtral Mini Transcribe
Voxtral Mini Transcribe 2
Voxtral Small
Whisper (L, v3), fal.ai
Whisper (L, v3), Fireworks
Whisper Large v3, together.ai
Wizper (L, v3), fal.ai
Speed Factor
Speed Factor
Input audio seconds transcribed per second · Higher is better
Price
Price of Transcription
USD per 1000 minutes of audio · Lower is better
Summary of Key Metrics & Further Information
| Provider | Model | Whisper version | Footnotes | Word Error Rate (%) | Median Speed Factor | Price (USD per 1000 minutes) | Further Details |
|---|---|---|---|---|---|---|---|
| Qwen3.5 Omni Flash | 13.6% | 93.1 | 0.00 | ||||
| Qwen3.5 Omni Plus | 3.7% | 97.8 | 0.00 | ||||
| Gemini 3.1 Pro Preview (High) | 2.9% | 6.2 | 18.15 | ||||
| Gemini 3.1 Pro Preview (Low) | 3.8% | 6.3 | 7.72 | ||||
| Gemini 3 Flash (High) | 3.1% | 15.3 | 13.70 | ||||
| Gemini 3 Pro (High) | 2.9% | 6.5 | 18.40 | ||||
| Gemini 2.5 Flash Lite | 5.3% | 66.6 | 6.56 | ||||
| Gemini 2.5 Flash | 5.2% | 77.2 | 6.66 | ||||
| Gemini 2.5 Pro | 3.0% | 12.5 | 11.39 | ||||
| Gemini 2.0 Flash Lite | 3.9% | 51.4 | 0.19 | ||||
| Gemini 2.0 Flash | 3.9% | 52.8 | 1.40 | ||||
| Gemini 3.1 Flash-Lite Preview (Minimal) | 3.5% | 78.7 | 5.83 | ||||
| Pulse STT | 4.5% | 139.8 | 5.00 | ||||
| Voxtral Mini Transcribe 2 | 3.7% | 76.8 | 3.00 | ||||
| Voxtral Mini Transcribe | 3.7% | 53.4 | 1.00 | ||||
| Voxtral Small | 2.9% | 67.5 | 4.00 | ||||
| Voxtral Mini | 4.0% | 80.1 | 1.00 | ||||
| Universal-3 Pro | 3.3% | 96.3 | 3.50 | ||||
| Universal, AssemblyAI | 3.9% | 101.8 | 2.50 | ||||
| Soniox V4 | 4.0% | 42.5 | 1.66 | ||||
| Scribe v2 | 2.2% | 30.7 | 6.67 | ||||
| Scribe v1 | 3.1% | 39.4 | 6.67 | ||||
| Nova 2 Pro | 5.0% | 23.5 | 3.10 | ||||
| Gradium Speech-to-Text | 8.5% | 2.3 | 13.00 | ||||
| Parakeet TDT 0.6B V3, Togetherai | 4.6% | 313.7 | 1.50 | ||||
| Canary Qwen 2.5B, NVIDIA | 4.3% | 5.2 | 0.74 | ||||
| Parakeet TDT 0.6B V2, NVIDIA | 6.5% | 98.1 | 0.00 | ||||
| Parakeet RNNT 1.1B | 5.5% | 5.6 | 1.91 | ||||
| Solaria-1, Gladia | 4.2% | 47.2 | 4.07 | ||||
| GPT-4o Transcribe | 4.1% | 31.6 | 6.00 | ||||
| GPT-4o Mini Transcribe | 4.6% | 50.9 | 3.00 | ||||
| Nova-3 | 5.3% | 122.0 | 4.30 | ||||
| Nova-2 | 5.4% | 559.8 | 4.30 | ||||
| Base | 10.8% | 658.1 | 12.50 | ||||
| Whisper Large v3 Turbo | v3 Turbo | 4.8% | 266.3 | 0.67 | |||
| Whisper Large v3 Turbo | v3 Turbo | 4.8% | 245.7 | 1.00 | |||
| Wizper Large v3 | large-v3 | 4.8% | 237.7 | 0.50 | |||
| Incredibly Fast Whisper | large-v3 | 5.8% | 53.6 | 1.49 | |||
| Whisper Large v3 | large-v3 | 10.2% | 2.6 | 4.23 | |||
| Whisper Large v3 | large-v3 | 4.2% | 94.0 | 1.15 | |||
| Whisper Large v3 | large-v3 | 4.7% | 257.0 | 1.00 | |||
| Whisper Large v3 | large-v3 | 4.7% | 251.0 | 1.50 | |||
| Whisper Large v2 | large-v2 | 4.2% | 28.6 | 6.00 | |||
| Amazon Transcribe | 4.2% | 18.3 | 24.00 | ||||
| Speechmatics Standard | 5.1% | 71.7 | 4.00 | ||||
| Speechmatics Enhanced | 4.1% | 68.3 | 6.70 | ||||
| Rev AI | 6.0% | 12.4 | 20.00 |
Speech to Text providers compared: Alibaba Cloud, Google, Smallest.ai, Mistral, AssemblyAI, Soniox, ElevenLabs, Amazon Bedrock, Gradium, Together.ai, Replicate, DeepInfra, NVIDIA, Gladia, OpenAI, Deepgram, Groq, Fireworks, fal.ai, Speechmatics, Rev AI.