Fish Audio: API Provider Benchmarking & Analysis
Analysis of API providers of Fish Audio across performance metrics including word error rate, speed, and price.
API providers compared include OpenAI, AssemblyAI, Speechmatics, Microsoft Azure, fal.ai, Replicate, Deepgram, Gladia, Groq, Deepinfra, Fireworks, Amazon Bedrock, Fish Audio, Rev AI, Google, ElevenLabs, SambaNova, IBM, Together.ai, and Mistral.
API providers compared include OpenAI, AssemblyAI, Speechmatics, Microsoft Azure, fal.ai, Replicate, Deepgram, Gladia, Groq, Deepinfra, Fireworks, Amazon Bedrock, Fish Audio, Rev AI, Google, ElevenLabs, SambaNova, IBM, Together.ai, and Mistral.
Highlights
Word error rate
Word error rate: % of words transcribed incorrectly, Lower is better
Speed Factor
Speed factor: Input audio seconds transcribed per second, Higher is better
Price
Price: USD per 1000 minutes of audio, Lower is better
Summary Analysis
Word Error Rate vs. Price
Word error rate: % of words transcribed incorrectly, Price: USD per 1000 minutes of audio
Most attractive quadrant
Size represents Speed factor: Input audio seconds transcribed per second
Amazon Transcribe
Azure AI Speech Service
Chirp 2
Enhanced
GPT-4o Mini Transcribe
GPT-4o Transcribe
Nova-3
Parakeet RNNT 1.1B, Replicate
Scribe
Universal-2
Voxtral Small
Whisper (L, v2), OpenAI
Whisper (L, v3, Turbo), Groq
Whisper (L, v3), Fireworks
Whisper (L, v3), SambaNova
Wizper (L, v3), fal.ai
Word Error Rate: Percentage of words incorrect in the transcription. Evaluation updated June 2024 to 5,000 test samples.
Artificial Analysis' independent evaluation is based on Common Voice v16.1, Mozilla's leading open-source speech to text dataset. Further detail present on methodology page.
Price: Cost in USD per 1000 minutes of audio transcribed. Reflects the pricing model of the transcription service or software.
Word Error Rate vs. Speed Factor
Word error rate: % of words transcribed incorrectly, Speed factor: Input audio seconds transcribed per second
Most attractive quadrant
Size represents Price: USD per 1000 minutes of audio
Amazon Transcribe
Chirp 2
Enhanced
GPT-4o Mini Transcribe
GPT-4o Transcribe
Nova-3
Parakeet RNNT 1.1B, Replicate
Scribe
Universal-2
Voxtral Small
Whisper (L, v2), OpenAI
Whisper (L, v3, Turbo), Groq
Whisper (L, v3), Fireworks
Whisper (L, v3), SambaNova
Wizper (L, v3), fal.ai
Word Error Rate: Percentage of words incorrect in the transcription. Evaluation updated June 2024 to 5,000 test samples.
Summary of key metrics & further information
Provider | Model | Whisper version | Footnotes | Word Error Rate (%) | Median Speed Factor | Price (USD per 1000 minutes) | Further Details |
---|---|---|---|---|---|---|---|
Whisper Large v2 | large-v2 | 10.6% |