Menu

logo
Artificial Analysis
HOME
logo

Google Chirp: API Provider Benchmarking & Analysis

Analysis of API providers of Google Chirp across performance metrics including word error rate, speed, and price.
API providers compared include OpenAI, AssemblyAI, Speechmatics, Microsoft Azure, fal.ai, Replicate, Deepgram, Gladia, Groq, Deepinfra, Fireworks, Amazon Bedrock, Rev AI, and Google.
Creator:
Google
License:
Proprietary
Link:

Highlights

Word error rate
Word error rate: % of words transcribed incorrectly (June '24), Lower is better
Speed Factor
Speed factor: Input audio seconds transcribed per second, Higher is better
Price
Price: USD per 1000 minutes of audio, Lower is better

Summary analysis

Word Error Rate vs. Price

Word Error Rate: Percentage of words incorrect in the transcription. Evaluation updated June 2024 to 5,000 test samples.
Artificial Analysis' independent evaluation is based on Common Voice v16.1, Mozilla's leading open-source speech to text dataset. Further detail present on methodology page.
Price: Cost in USD per 1000 minutes of audio transcribed. Reflects the pricing model of the transcription service or software.

Word Error Rate vs. Speed Factor

Word Error Rate: Percentage of words incorrect in the transcription. Evaluation updated June 2024 to 5,000 test samples.
Artificial Analysis' independent evaluation is based on Common Voice v16.1, Mozilla's leading open-source speech to text dataset. Further detail present on methodology page.
Speed Factor: Audio file seconds transcribed per second of processing time. Higher factor indicates faster transcription speed.
Artificial Analysis' measurements are based on a audio duration of 10 minutes. Speed Factor may vary for other durations, particuarly for very short durations (under 1 minute).

Speed Factor vs. Price

Speed Factor: Audio file seconds transcribed per second of processing time. Higher factor indicates faster transcription speed.
Artificial Analysis' measurements are based on a audio duration of 10 minutes. Speed Factor may vary for other durations, particuarly for very short durations (under 1 minute).
Price: Cost in USD per 1000 minutes of audio transcribed. Reflects the pricing model of the transcription service or software.

Word Error Rate

Word error rate: % of words transcribed incorrectly (June '24), Lower is better
Word Error Rate: Percentage of words incorrect in the transcription. Evaluation updated June 2024 to 5,000 test samples.
Artificial Analysis' independent evaluation is based on Common Voice v16.1, Mozilla's leading open-source speech to text dataset. Further detail present on methodology page.

Speed Factor

Speed factor: Input audio seconds transcribed per second, Higher is better
Speed Factor: Audio file seconds transcribed per second of processing time. Higher factor indicates faster transcription speed.
Artificial Analysis' measurements are based on a audio duration of 10 minutes. Speed Factor may vary for other durations, particuarly for very short durations (under 1 minute).

Speed Factor, Variance

Speed factor: Input audio seconds transcribed per second, Results by percentile, Higher is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Speed Factor: Audio file seconds transcribed per second of processing time. Higher factor indicates faster transcription speed.
Artificial Analysis' measurements are based on a audio duration of 10 minutes. Speed Factor may vary for other durations, particuarly for very short durations (under 1 minute).
Boxplot: Shows variance of measurements
Picture of the author

Speed Factor, Over Time

Speed Factor: Audio file seconds transcribed per second of processing time. Higher factor indicates faster transcription speed.
Artificial Analysis' measurements are based on a audio duration of 10 minutes. Speed Factor may vary for other durations, particuarly for very short durations (under 1 minute).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.

Price

Price: USD per 1000 minutes of audio, Lower is better
Price: Cost in USD per 1000 minutes of audio transcribed. Reflects the pricing model of the transcription service or software.
For providers which do not price based on audio duration and rather on processing time (incl. Replicate, fal), we have calculated an indicative per minute price based on processing time expected per minute of audio.Further detail present on methodology page.
Note: Groq chargers for a minimum of 10s per request.
Summary of key metrics & further information
ProviderFurther
Details
Whisper (large-v2) logoOpenAI
Whisper (large-v2) logoMicrosoft Azure
Wizper (Large v3) logofal.ai
Incredibly Fast Whisper logoReplicate
Whisper (large-v2) logoReplicate
Whisper (large-v3) logoReplicate
WhisperX logoReplicate
Whisper (medium) logoReplicate
Whisper (small) logoReplicate
Whisper (large-v3) logoGroq
Distil-Whisper, Groq logoGroq
Whisper (large-v3), Deepinfra logoDeepinfra
Whisper (large-v3) logofal.ai
Distil-Whisper, Deepinfra logoDeepinfra
Whisper (large-v3 Turbo) logoGroq
Whisper (large-v3) logoFireworks
Whisper (large-v3 Turbo) logoFireworks
AssemblyAI (Universal-1) logoAssemblyAI
Nano logoAssemblyAI
Speechmatics Standard logoSpeechmatics
Speechmatics Enhanced logoSpeechmatics
Azure Speech Service logoMicrosoft Azure
Nova-2 logoDeepgram
Base logoDeepgram
Whisper Large v2 logoDeepgram
Gladia logoGladia
Amazon Transcribe logoAmazon Bedrock
Rev AI logoRev AI
Cloud Speech-To-Text (Chirp) logoGoogle