Menu

logo
Artificial Analysis
HOME
logo

Google Cloud TTS: Quality, Generation Time & Price Analysis

Analysis of Google's models and comparison to other audio models across key metrics including quality, generation time, and price. API providers compared include OpenAI, Google, Amazon Bedrock, Microsoft Azure, Replicate, Cartesia, ElevenLabs, and LMNT.

For further details, see our methodology page.

Creator:
Google
Link:

Highlights

Quality ELO
Arena ELO: Average ELO rating of the model, Higher is better
Characters per Second
Characters processed per second: # of characters per second of generation time, Higher is better
Price
Price: USD per 1M characters of text, Lower is better

Summary Analysis

Quality vs. Price

Arena ELO: Average ELO rating of the model, Price: USD per 1M characters of text
Most attractive quadrant
Size represents Characters processed per second: # of characters per second of generation time
Quality ELO: Relative ELO score of the models as determined by responses from users in Artificial Analysis' Quality Arena.Some models may not be shown due to not yet having enough votes.Note that this is intended to represent quality of generalist use-cases (conversational AI assistant, customer support system or reading an email) and may not be representative of all use-cases.
Price: Price per 1M characters of text. For detail on how we calculate price for providers which price based on inference time or subscription plans, see our methodology page.

Quality vs. Speed

Arena ELO: Average ELO rating of the model, Characters processed per second: # of characters per second of generation time
Most attractive quadrant
Size represents Price: USD per 1M characters of text
Quality ELO: Relative ELO score of the models as determined by responses from users in Artificial Analysis' Quality Arena.Some models may not be shown due to not yet having enough votes.Note that this is intended to represent quality of generalist use-cases (conversational AI assistant, customer support system or reading an email) and may not be representative of all use-cases.
Characters per Second: Number of characters processed per second of generation time. Higher values indicate faster generation speeds.

Speed vs. Price

Characters per Second: Number of characters processed per second of generation time. Higher values indicate faster generation speeds.
Price: Price per 1M characters of text. For detail on how we calculate price for providers which price based on inference time or subscription plans, see our methodology page.

Quality Arena ELO (Text to Speech Arena)

Arena ELO: Average ELO rating of the model, Higher is better
Quality ELO: Relative ELO score of the models as determined by responses from users in Artificial Analysis' Quality Arena.Some models may not be shown due to not yet having enough votes.Note that this is intended to represent quality of generalist use-cases (conversational AI assistant, customer support system or reading an email) and may not be representative of all use-cases.

Arena Win Rate

Arena Win Rate: % Win rate in Text to Speech Arena, Higher is better
Win Rate: Proportion of time an audio clip generated by the model was selected as preferred compared to the other audio clip present in Artificial Analysis' Quality Arena.

 Participate in the Speech Arena to contribute to the crowdsourced quality evaluations

Characters Per Second

Characters processed per second: # of characters per second of generation time, Higher is better
Characters per Second: Number of characters processed per second of generation time. Higher values indicate faster generation speeds.

Speed Factor

Speed factor: Output audio seconds generated per second, Higher is better
Speed Factor: Output audio seconds generated per second. Higher values indicate faster generation speeds. Characters per second is generally preferred as a benchmark of API generation speed as there is variable output audio seconds per character depending on the model (e.g. slower speaking voice).

Characters Per Second, Variance

Characters processed per second: # of characters per second of generation time, Results by percentile, Higher is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Characters per Second: Number of characters processed per second of generation time. Higher values indicate faster generation speeds.
Boxplot: Shows variance of measurements
Picture of the author

Characters per Second, Over Time

Characters processed per second: # of characters per second of generation time, Higher is better
Characters per Second: Number of characters processed per second of generation time. Higher values indicate faster generation speeds.
Over time measurement: Median measurement per day, based on 4 measurements each day at different times. Labels represent start of week's measurements.

Price

Price: USD per 1M characters of text, Lower is better
Price: Price per 1M characters of text. For detail on how we calculate price for providers which price based on inference time or subscription plans, see our methodology page.

Streaming Support

ProviderStreaming Support
OpenAI logoOpenAI
Google logoGoogle
Amazon logoAmazon
Azure logoAzure
Replicate logoReplicate
ElevenLabs logoElevenLabs
LMNT logoLMNT
Streaming Support: Indicates whether the provider supports streaming of audio from their API. We plan to add performance benchmarking of streaming support in the future.
Summary of Key Metrics & Further Information
ProviderFurther
Details
Studio, Google Cloud TTS logoGoogle
Journey, Google Cloud TTS logoGoogle
WaveNet, Google Cloud TTS logoGoogle
Standard, Google Cloud TTS logoGoogle
Neural2, Google Cloud TTS logoGoogle