Gladia: API Provider Benchmarking & Analysis
Analysis of Gladia API providers across performance metrics including Artificial Analysis Word Error Rate Index, speed, and price.
Highlights
Artificial Analysis Word Error Rate (AA-WER) Index by API
Artificial Analysis Word Error Rate (AA-WER) Index by API
% of words transcribed incorrectly · Lower is better · AA-WER v2 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%)
Note: For Earnings22, if a model cannot reliably handle full-length audio due to time limits, we chunk to ~9 minutes (relevant to: Nova 2 Pro, Amazon; Voxtral Mini Transcribe, Mistral; GPT-4o Transcribe, OpenAI; GPT-4o Mini Transcribe, OpenAI). For models with even shorter time limits, we chunk to ~30 seconds (relevant to: Canary Qwen 2.5B, NVIDIA).
API Benchmarks
Artificial Analysis Word Error Rate Index vs. Price
% of words transcribed incorrectly · Lower is better · AA-WER v2 incorporates 3 datasets: AA-AgentTalk (50%), VoxPopuli-Cleaned-AA (25%), Earnings22-Cleaned-AA (25%) · USD per 1000 minutes of audio
Most attractive quadrant
Speed Factor
Speed Factor
Input audio seconds transcribed per second · Higher is better
Price
Price of Transcription
USD per 1000 minutes of audio · Lower is better
Summary of Key Metrics & Further Information
Provider | Further Details | ||||
|---|---|---|---|---|---|
Qwen3.5 Omni Flash | 13.5% | 78.6 | 0.00 | ||
Qwen3.5 Omni Plus | 3.5% | 100.7 | 0.00 | ||
Nova 2 Pro | 4.9% | 22.8 | 3.10 | ||
Amazon Transcribe | 4.1% | 17.7 | 24.00 | ||
Universal-3 Pro | 3.1% | 90.3 | 3.50 | ||
Universal, AssemblyAI | 3.8% | 116.8 | 2.50 | ||
MAI-Transcribe-1.5 | 2.4% | 260.0 | 6.00 | ||
MAI-Transcribe-1 | 2.6% | 56.6 | 6.00 | ||
transcribe-03-2026 | 4.6% | 38.9 | 0.00 | ||
Nova-3 | 5.2% | 475.6 | 4.30 | ||
Nova-2 | 5.3% | 549.6 | 4.30 | ||
Base | 10.7% | 353.0 | 12.50 | ||
Scribe v2 | 2.2% | 32.2 | 3.67 | ||
Scribe v1 | 3.0% | 39.9 | 6.67 | ||
Solaria-1, Gladia | 4.1% | 60.4 | 4.07 | ||
Solaria-3, Gladia | 3.2% | 61.0 | 10.16 | ||
Gemini 3.1 Pro Preview (High) | 2.8% | 6.4 | 18.15 | ||
Gemini 3.1 Pro Preview (Low) | 3.6% | 7.5 | 7.72 | ||
Gemini 3 Flash (High) | 2.9% | 16.0 | 13.70 | ||
Gemini 2.5 Flash Lite | 5.2% | 60.9 | 6.56 | ||
Gemini 2.5 Flash | 5.1% | 69.3 | 6.66 | ||
Gemini 2.5 Pro | 2.9% | 14.0 | 11.39 | ||
Gemini 3.1 Flash-Lite Preview (Minimal) | 3.4% | 74.6 | 5.83 | ||
Gradium Speech-to-Text | 8.4% | 2.3 | 13.00 | ||
Grok Speech to Text, xAI | 4.0% | 106.4 | 1.67 | ||
Voxtral Mini Transcribe 2 | 3.6% | 79.7 | 3.00 | ||
Voxtral Mini Transcribe | 3.5% | 67.2 | 2.00 | ||
Voxtral Small | 2.8% | 55.5 | 4.00 | ||
Voxtral Mini | 3.8% | 75.4 | 1.00 | ||
Modulate STT Batch English VFast | 4.2% | 175.1 | 0.42 | ||
Parakeet TDT 0.6B V3, Togetherai | 4.5% | 917.8 | 1.50 | ||
Canary Qwen 2.5B, NVIDIA | 4.3% | 5.9 | 0.74 | ||
Parakeet TDT 0.6B V2, NVIDIA | 6.4% | 97.8 | 0.00 | ||
Parakeet RNNT 1.1B | 5.4% | 6.3 | 1.91 | ||
GPT-4o Transcribe | 4.0% | 32.5 | 6.00 | ||
GPT-4o Mini Transcribe | 4.5% | 48.5 | 3.00 | ||
Smallest AI Pulse Pro | 2.4% | 297.5 | 3.50 | ||
Resonant-1 | 3.4% | 264.4 | 3.60 | ||
Rev AI | 5.9% | 10.6 | 3.33 | ||
Smallest AI Pulse | 4.4% | 229.4 | 5.00 | ||
Soniox v5 Async | 3.8% | 33.6 | 1.66 | ||
Soniox V4 | 3.9% | 48.4 | 1.66 | ||
Speechmatics Standard | 5.1% | 80.4 | 4.00 | ||
Speechmatics Enhanced | 4.0% | 63.3 | 6.70 | ||
Whisper Large v3 Turbo | 4.6% | 110.0 | 0.67 | ||
Wizper Large v3 | 4.7% | 181.0 | 0.50 | ||
Incredibly Fast Whisper | 5.7% | 50.6 | 1.49 | ||
Whisper Large v3 | 10.1% | 2.9 | 4.23 | ||
Whisper Large v3 | 4.1% | 77.7 | 1.15 | ||
Whisper Large v3 | 4.5% | 412.4 | 1.50 | ||
Whisper Large v2 | 4.1% | 27.4 | 6.00 |
Speech to Text providers compared: Gladia.