Whisper: API Provider Benchmarking & Analysis

Analysis of Whisper API providers across performance metrics including Artificial Analysis Word Error Rate Index, speed, and price.

Creator:

OpenAI

License:

Open

Link:

Visit

Highlights

Word Error Rate Index

% of words transcribed incorrectly; Lower is better

Speed Factor

Input audio seconds transcribed per second; Higher is better

Price

USD per 1000 minutes of audio; Lower is better

Navigation

Artificial Analysis Word Error Rate (AA-WER) Index by API Artificial Analysis Word Error Rate (AA-WER) Index vs Other Metrics Speed Factor Price

Artificial Analysis Word Error Rate (AA-WER) Index by API

Back to Navigation

Artificial Analysis Word Error Rate (AA-WER) Index by API

% of words transcribed incorrectly, Lower is better

Note: Models that do not support transcription of audio longer than 10 minutes were evaluated on 9-minute chunks of the test set (applies to GPT-4o Transcribe; GPT-4o Mini Transcribe; Voxtral Mini; Voxtral Mini, Deepinfra; Gemini 2.5 Flash Lite). For models with even shorter time limits, all files are split into 30-second chunks (applies to ).

Measures transcription accuracy across 3 datasets to evaluate models in real-world speech with diverse accents, domain-specific language, and challenging channel & acoustic conditions.

AA-WER is calculated as an audio-duration-weighted average of WER across ~2 hours from three datasets: VoxPopuli, Earnings-22, and AMI-SDM. See methodology for more detail.

Artificial Analysis Word Error Rate (AA-WER) Index vs Other Metrics

Back to Navigation

Artificial Analysis Word Error Rate Index vs. Price

% of words transcribed incorrectly, USD per 1000 minutes of audio

Most attractive quadrant

Size represents Input audio seconds transcribed per second

Amazon Transcribe

Canary Qwen 2.5B, Replicate

Chirp 2, Google

GPT-4o Transcribe

Incredibly Fast , Replicate

L, v2, Azure

L, v2, Replicate

L, v3, Deepinfra

L, v3, fal.ai

L, v3, Fireworks

L, v3, Groq

L, v3, Replicate

L, v3, SambaNova

L, v3, Turbo, Fireworks

L, v3, Turbo, Groq

Large v2, OpenAI

Large v3, together.ai

Nova 2 Omni

Parakeet TDT 0.6B V3, Hathora

Scribe, ElevenLabs

Speechmatics Enhanced

Universal, AssemblyAI

Voxtral Small

X, Replicate

Measures transcription accuracy across 3 datasets to evaluate models in real-world speech with diverse accents, domain-specific language, and challenging channel & acoustic conditions.

AA-WER is calculated as an audio-duration-weighted average of WER across ~2 hours from three datasets: VoxPopuli, Earnings-22, and AMI-SDM. See methodology for more detail.

Cost in USD per 1000 minutes of audio transcribed. Reflects the pricing model of the transcription service or software.

Speed Factor

Back to Navigation

Speed Factor

Input audio seconds transcribed per second, Higher is better

Audio file seconds transcribed per second of processing time. Higher factor indicates faster transcription speed.

Artificial Analysis measurements are based on a audio duration of 10 minutes. Speed Factor may vary for other durations, particularly for very short durations (under 1 minute).

Price

Back to Navigation

Price of Transcription

USD per 1000 minutes of audio, Lower is better

Cost in USD per 1000 minutes of audio transcribed. Reflects the pricing model of the transcription service or software.

For providers which do not price based on audio duration and rather on processing time (incl. Replicate, fal), we have calculated an indicative per minute price based on processing time expected per minute of audio.Further detail present on methodology page.

Note: Groq chargers for a minimum of 10s per request.

Summary of Key Metrics & Further Information

Provider	Model	Whisper version	Word Error Rate (%)	Median Speed Factor	Price (USD per 1000 minutes)	Further Details
OpenAI	Whisper Large v2	large-v2	15.8%	26.6	6.00	Details
Microsoft Azure	Whisper Large v2	large-v2	27.2%	34.4	6.00	Details
fal.ai	Wizper Large v3	large-v3	16.8%	295.2	0.50	Details
Replicate	Incredibly Fast Whisper	large-v3	18.2%	64.6	1.49	Details
Replicate	Whisper Large v2	large-v2	15.8%	2.4	3.47	Details
Replicate	Whisper Large v3	large-v3	24.6%	2.8	4.23	Details
Replicate	WhisperX	large-v3	16.3%	19.1	1.09	Details
Groq	Whisper Large v3	large-v3	16.8%	306.9	1.85	Details
DeepInfra	Whisper Large v3	large-v3	16.8%	92.3	0.45	Details
fal.ai	Whisper Large v3	large-v3	16.8%	136.2	1.15	Details
Groq	Whisper Large v3 Turbo	v3 Turbo		370.6	0.67	Details
Fireworks	Whisper Large v3	large-v3		410.9	1.00	Details
Fireworks	Whisper Large v3 Turbo	v3 Turbo	17.8%	478.9	1.00	Details
SambaNova	Whisper-Large-v3	large-v3	16.8%		1.67	Details
Together.ai	Whisper Large v3	large-v3	24.6%	127.4	1.50	Details
Speechmatics	Speechmatics Standard		16.0%	44.2	4.00	Details
Speechmatics	Speechmatics Enhanced		14.4%	24.4	6.70	Details
Microsoft Azure	Azure AI Speech Service		17.2%	2.0	16.67	Details
Deepgram	Nova-2		17.3%	345.3	4.30	Details
Deepgram	Base		21.9%	381.9	12.50	Details
Deepgram	Nova-3		18.3%	462.2	4.30	Details
AssemblyAI	Universal, AssemblyAI		14.5%	83.2	2.50	Details
AssemblyAI	Slam-1		15.1%	57.1	4.50	Details
Amazon Bedrock	Amazon Transcribe		14.0%	18.2	24.00	Details
Google	Chirp 2, Google		11.6%	16.9	16.00	Details
Google	Chirp		16.9%	14.0	16.00	Details
Google	Chirp 3, Google		15.0%	31.3	16.00	Details
ElevenLabs	Scribe, ElevenLabs			46.1	6.67	Details
Google	Gemini 2.0 Flash		17.9%	54.3	1.40	Details
Google	Gemini 2.0 Flash Lite		16.6%	58.6	0.19	Details
Google	Gemini 2.5 Flash Lite		16.1%	74.6	0.58	Details
Google	Gemini 2.5 Flash		19.2%	108.1	1.92	Details
Google	Gemini 2.5 Pro		15.0%	12.1	0.00	Details
OpenAI	GPT-4o Transcribe		21.3%	26.2	6.00	Details
OpenAI	GPT-4o Mini Transcribe		20.1%	33.1	3.00	Details
Replicate	Parakeet RNNT 1.1B			6.5	1.91	Details
NVIDIA	Parakeet TDT 0.6B V2, NVIDIA			57.4	0.00	Details
Replicate	Canary Qwen 2.5B, NVIDIA		13.2%	5.7	0.74	Details
Hathora	Parakeet TDT 0.6B V3, Hathora		13.0%	34.0	1.32	Details
Mistral	Voxtral Mini		15.8%	54.5	1.00	Details
Mistral	Voxtral Small		14.7%	69.3	4.00	Details
DeepInfra	Voxtral Small		14.7%	24.6	3.00	Details
DeepInfra	Voxtral Mini		15.8%	66.7	1.00	Details
Gladia	Solaria-1, Gladia		17.4%	46.7	8.33	Details
Amazon Bedrock	Nova 2 Omni		15.9%	39.2	1.85	Details
Amazon Bedrock	Nova 2 Pro		15.8%	23.2	3.10	Details

Speech to Text providers compared: OpenAI, Speechmatics, Microsoft Azure, fal.ai, Replicate, Deepgram, Groq, DeepInfra, Fireworks, AssemblyAI, Amazon Bedrock, Google, ElevenLabs, SambaNova, Together.ai, Mistral, NVIDIA, Gladia, and Hathora.

Provider

Further
Details

OpenAI

Details

Microsoft Azure

Details

fal.ai

Details

Replicate

Details

Replicate

Details

Replicate

Details

Replicate

Details

Groq

Details

DeepInfra

Details

fal.ai

Details

Groq

Details

Fireworks

Details

Fireworks

Details

SambaNova

Details

Together.ai

Details

Speechmatics

Details

Speechmatics

Details

Microsoft Azure

Details

Deepgram

Details

Deepgram

Details

Deepgram

Details

AssemblyAI

Details

AssemblyAI

Details

Amazon Bedrock

Details

Google

Details

Google

Details

Google

Details

ElevenLabs

Details

Google

Details

Google

Details

Google

Details

Google

Details

Google

Details

OpenAI

Details

OpenAI

Details

Replicate

Details

NVIDIA

Details

Replicate

Details

Hathora

Details

Mistral

Details

Mistral

Details

DeepInfra

Details

DeepInfra

Details

Gladia

Details

Amazon Bedrock

Details

Amazon Bedrock

Details

Whisper: API Provider Benchmarking & Analysis

Highlights

Navigation

Artificial Analysis Word Error Rate (AA-WER) Index by API

Artificial Analysis Word Error Rate (AA-WER) Index by API

Artificial Analysis Word Error Rate (AA-WER) Index

Artificial Analysis Word Error Rate (AA-WER) Index vs Other Metrics

Artificial Analysis Word Error Rate Index vs. Price

Artificial Analysis Word Error Rate (AA-WER) Index

Price

Speed Factor

Speed Factor

Speed Factor

Speed Factor Detail

Price

Price of Transcription

Price

Price Providers Detail

Whisper: API Provider Benchmarking & Analysis

Highlights

Navigation

Artificial Analysis Word Error Rate (AA-WER) Index by API

Artificial Analysis Word Error Rate (AA-WER) Index by API

Artificial Analysis Word Error Rate (AA-WER) Index

Artificial Analysis Word Error Rate (AA-WER) Index vs Other Metrics

Artificial Analysis Word Error Rate Index vs. Price

Artificial Analysis Word Error Rate (AA-WER) Index

Price

Speed Factor

Speed Factor

Speed Factor

Speed Factor Detail

Price

Price of Transcription

Price

Price Providers Detail