Text to Speech Benchmarking Methodology
Scope & Background
Artificial Analysis performs benchmarking on Text to Speech models delivered via serverless API endpoints. This page describes our Text to Speech benchmarking methodology, including both our quality benchmarking and performance benchmarking. We consider Text to Speech endpoints to be serverless when customers only pay for usage, not a fixed rate for access.
For both our performance benchmarking and within the Speech Arena, our focus is reflecting the end-user experience of users using the serverless APIs. We focus on benchmarking the time to receive the audio file locally. Where the API response is a URL rather than bytes, we include the time of downloading the file in our response time measurement. Our approach is to use the standard implementation of provider APIs as suggested by each provider's documentation. Where on option on the provider's API, we standardize the sample rate of audio to 22.05 kHz.
Key Metrics
We use the following metrics to track quality, performance and price for Text to Speech models.
Quality ELO
Relative ELO score of the models as determined by responses from users in the Artificial Analysis Text to Speech Arena.
Some models may not be shown due to not yet having enough votes. We use a similar Linear Regression model, similar to how LMSYS calculates ELO scores for Chatbot Arena.
Price per 1M Characters
All TTS models are reported as price per 1M characters of input text. Where providers do not price per character directly, we derive an equivalent:
- Per character: Listed rate used directly
- Subscription plans: Effective rate from the lowest-cost plan, annual where available, that includes at least 1M characters, assuming 80% utilization
- Token-based: Derived using batch pricing, converting characters to input tokens and estimating audio output duration
- Output duration: Converted using an assumed speaking rate of 150 words per minute, approximately 825 characters per minute
- Per byte: 1 UTF-8 byte is approximately 1 character for English text
- Inference time: Estimated from benchmark runs using ~25 texts of ~500 characters
- Open weights: No commercial API pricing; listed without price
When reporting price, we do not include temporary discounts.
Generation Time
Median time the provider takes to generate a single audio clip with ~500 input characters, calculated over the past 14 days of measurements.
Generation Time includes downloading the audio clip from the provider where a URL is provided rather than an audio response. This is to reflect the end-user latency of receiving a generated audio clip and as URLs can be generated prior to audio completion. Audio clips are generated at batch size of 1 where relevant.
Benchmarking is conducted 4 times daily at random times each day. For each benchmarking evlauation we select a single random voice for each model. A unique prompt of ~500 characters is used for each generation.
Model Voices
For each model tested, we test multiple voices to ensure that our comparison between models is representative and fair. Voice characteristics such as accent, gender and style are typically aspects of the voices that each model can generate speech for, not the underlying model. For each model we select 2 voices of each combination of Male and Female, and US and UK accents (8 combinations in total). Where a gender and accent is not available, we exclude this combination from evaluation in the Speech Arena.
Voices are selected for each model based on their prominence in provider interface and documentation, excluding voices which are not neutral in nature (e.g. Los Angeles valley and deep southern accents are excluded). Creators of the models may also request we use specific voices where many are available. Where voices are not provided, as is typically the case for open source models, we use voices clips from professional voice actors as source files for generating speech. All voice clips have been licensed for commercial use.
Below, we list the voices used for each model.
| Model Name | Model Creator | Voices Used (Gender, Accent) |
|---|---|---|
| AsyncFlow V2, async |
| Faith (Female, US), Penelope (Female, US), Griffin (Male, US), Xavier (Male, US), Robert (Male, UK), Chase (Male, UK), Nyomi (Female, UK), Violet (Female, UK) |
| Azure HD 2.5 |
| Sonia (Mar 2026) (Female, UK), Andrew (Mar 2026) (Male, US), Ava - US (Mar 2026) (Female, US), Ollie (Mar 2026) (Male, UK), Brian (Mar 2026) (Male, US), Serena (Mar 2026) (Female, US), Andres 2 (Mar 2026) (Male, US), Ava - UK (Mar 2026) (Female, UK) |
| Azure Neural |
| Andrew Multilingual (Male, US), Brian Multilingual (Male, US), Emma Multilingual (Female, US), Ava Multilingual (Female, US), Ava - UK (Mar 2026) (Female, UK), Sonia (Mar 2026) (Female, UK), Andrew (Mar 2026) (Male, US), Ava - US (Mar 2026) (Female, US), Ryan (Male, UK), Sonia (Female, UK), Libby (Female, UK), Alfie (Male, UK), Ollie (Mar 2026) (Male, UK), Brian (Mar 2026) (Male, US), Serena (Mar 2026) (Female, US), Andrew 2 (Mar 2026) (Male, US) |
| Chatterbox |
| Abbey (Female, US), Susan (Female, US), Alan (Male, US), Michael (Male, US), Amy (Female, UK), Redd (Female, UK), Dave (Male, UK), Tom (Male, UK) |
| Chatterbox HD |
| Susan (Female, US), Michael (Male, US), Abbey (Female, US), Alan (Male, US), Redd (Female, UK), Tom (Male, UK), Dave (Male, UK), Amy (Female, UK) |
| Chirp 3: HD |
| Aoede (Female, US), Achird (Male, US), Algenib (Male, US), Achernar (Female, US), Autonoe (Female, UK), Callirrhoe (Female, UK), Algieba (Male, UK), Alnilam (Male, UK) |
| Eleven v3 |
| Laura (Female, US), Jessica (Female, US), Jarnathan (Male, US), Liam (Male, US), Elizabeth (Female, UK), Shelley (Female, UK), Dan (Male, UK), Nathaniel (Male, UK) |
| Falcon (Beta) |
| Carter (Male, US), Phoebe (Female, US), Terrell (Male, US), Natalie (Female, US), Theo (Male, UK), Mason (Male, UK), Ruby (Female, UK), Hazel (Female, UK) |
| Fish Audio S2 Pro |
| Laura (Female, US), Sarah (Female, US), Jordan (Male, US), Adrian (Male, US) |
| Fish Speech 1.5 |
| US Female Susan (Female, US), US Male Alan (Male, US), US Female Abbey (Female, US), US Male Michael (Male, US), UK Female Redd (Female, UK), UK Male Dave (Male, UK), UK Male Tom (Male, UK), UK Female Amy (Female, UK) |
| Flash v2.5 |
| Liam (Male, US), Laura (Male, US), Jarnathan (Male, US), Jessica (Male, US), Nathaniel (Male, UK), Shelley (Female, UK), Dan (Male, UK), Elizabeth (Female, UK) |
| Gemini 2.5 Flash Lite TTS |
| Achernar (Female, US), Achird (Male, US), Algenib (Male, US), Aoede (Female, US), Algieba (Male, UK), Alnilam (Male, UK), Autonoe (Female, UK), Callirrhoe (Female, UK) |
| Gemini 2.5 Flash TTS (Dec 2025) |
| Algieba (Male, UK), Achernar (Female, US), Achird (Male, US), Algenib (Male, US), Aoede (Female, US), Alnilam (Male, UK), Autonoe (Female, UK), Callirrhoe (Female, UK) |
| Gemini 3.1 Flash TTS |
| Fenrir (Male, US), Charon (Male, US), Aoede (Female, US), Zephyr (Female, US) |
| Inworld TTS 1 |
| Ashley (Female, US), Reed (Male, US), Jason (Male, US), Sarah (Female, US), Eleanor (Female, UK), Sophie (Female, UK), Felix (Male, UK), Clive (Male, UK) |
| Inworld TTS 1 Max |
| Ashley (Female, US), Reed (Male, US), Jason (Male, US), Sarah (Female, US), Eleanor (Female, UK), Sophie (Female, UK), Felix (Male, UK), Clive (Male, UK) |
| Inworld TTS 1.5 Max |
| Ashley (Female, US), Reed (Male, US), Jason (Male, US), Sarah (Female, US), Sophie (Female, UK), Eleanor (Female, UK), Felix (Male, UK), Clive (Male, UK) |
| Inworld TTS 1.5 Mini |
| Ashley (Female, US), Reed (Male, US), Jason (Male, US), Sarah (Female, US), Eleanor (Female, UK), Sophie (Female, UK), Felix (Male, UK), Clive (Male, UK) |
| Journey |
| en-US-Journey-D (Male, US), en-US-Journey-F (Female, US) |
| Kokoro 82M v1.0 |
| Fenrir (Male, US), Bella (Female, US), Michael (Male, US), Aoede (Female, US), George (Male, UK), Fable (Male, UK), Emma (Female, UK), Isabella (Female, UK) |
| LMNT |
| daniel (Male, US), terrence (Male, US), lily (Female, US), chloe (Female, US), morgan (Female, UK) |
| Lightning v3.1 |
| Olivia (Female, US), Daniel (Male, US), Magnus (Male, US), Quinn (Female, US), Edward (Male, UK), Noah (Female, UK), Poppy (Female, UK), Liam (Male, UK) |
| MAI-Voice-1 |
| Jasper (Male, US), Reed (Male, US), Joy (Female, US), Iris (Female, US) |
| Magpie Multilingual |
| Leo (Male, US), Mia (Female, US), Aria (Female, US), Jason (Male, US) |
| Magpie-Multilingual 357M |
| Sofia (Female, US), Aria (Female, US), Jason (Male, US), Leo (Male, US), John (Male, US) |
| Magpie-Multilingual 357M (Feb 2026) |
| Sofia (Female, US), Aria (Female, US), Jason (Male, US), Leo (Male, US), John (Male, US) |
| Maya1 |
| Ava (Female, US), Noah (Male, US), James (Male, US), Emma (Female, US), Liam (Male, UK), Chloe (Female, UK), Oliver (Male, UK), Sophie (Female, UK) |
| MetaVoice v1 |
| Susan (Female, US), Michael (Male, US), Abbey (Female, US), Alan (Male, US), Redd (Female, UK), Tom (Male, UK), Dave (Male, UK), Amy (Female, UK) |
| MiMo-V2-TTS |
| MiMo Default (Female, US) |
| Multilingual v2 |
| Jessica (Female, US), Laura (Female, US), Liam (Male, US), Jarnathan (Male, US), Nathaniel (Male, UK), Dan (Male, UK), Shelley (Female, UK), Elizabeth (Female, UK) |
| Murf Speech Gen 2 |
| Carter (Male, US), Phoebe (Female, US), Terrell (Male, US), Natalie (Female, US), Theo (Male, UK), Mason (Male, UK), Ruby (Female, UK), Hazel (Female, UK) |
| Neuphonic TTS |
| Emily (Female, US), Liz (Female, UK), Albert (Male, UK), Julia (Female, UK), Dave (Male, UK), Miles (Male, US), Annie (Female, US) |
| Neural2 |
| en-US-Neural2-I (Male, US), en-US-Neural2-A (Male, US), en-US-Neural2-H (Female, US), en-US-Neural2-C (Female, US), en-GB-Neural2-B (Male, UK), en-GB-Neural2-D (Male, UK), en-GB-Neural2-C (Female, UK), en-GB-Neural2-A (Female, UK) |
| Octave 2 |
| FEMALE MEDITATION GUIDE (Female, US), MALE PROTAGONIST (Male, US), DONOVAN SINCLAIR (Male, US), SITCOM GIRL (Female, US), NATURE DOCUMENTARY NARRATOR (Male, UK), ALICE BENNETT (Female, UK), LADY ELIZABETH (Female, UK), SAD OLD BRITISH MAN (Male, UK) |
| Octave TTS |
| MALE PROTAGONIST (Male, US), DONOVAN SINCLAIR (Male, US), FEMALE MEDITATION GUIDE (Female, US), SITCOM GIRL (Female, US), ALICE BENNETT (Female, UK), LADY ELIZABETH (Female, UK), NATURE DOCUMENTARY NARRATOR (Male, UK), SAD OLD BRITISH MAN (Male, UK) |
| OpenAudio S1 |
| US Female Susan (Female, US), US Male Michael (Male, US), US Male Alan (Male, US), US Female Abbey (Female, US), UK Male Dave (Male, UK), UK Female Redd (Female, UK), UK Male Tom (Male, UK), UK Female Amy (Female, UK) |
| OpenVoice v2 |
| Susan (Female, US), Michael (Male, US), Abbey (Female, US), Alan (Male, US), Redd (Female, UK), Tom (Male, UK), Dave (Male, UK), Amy (Female, UK) |
| Polly Generative |
| Matthew (Male, US), Ruth (Female, US), Danielle (Female, US), Stephen (Male, US) |
| Polly Long-Form |
| Gregory (Male, US), Danielle (Female, US), Ruth (Female, US) |
| Polly Neural |
| Joey (Male, US), Gregory (Male, US), Joanna (Female, US), Danielle (Female, US), Brian (Male, UK), Brian (Male, UK), Amy (Female, UK), Emma (Female, UK) |
| Polly Standard |
| Joey (Male, US), Joanna (Female, US), Brian (Male, UK), Amy (Female, UK) |
| Qwen3 TTS |
| Amy (Female, UK), Redd (Female, UK), Michael (Male, US), Dave (Male, UK), Tom (Male, UK), Alan (Male, US), Susan (Female, US), Abbey (Female, US) |
| Qwen3 TTS Flash |
| Cherry (Female, US), Ryan (Male, US), Jennifer (Female, US), Ethan (Male, US) |
| SIMBA 1.0 |
| Patricia (Female, US), Robert (Male, US), Douglas (Male, US), Christina (Female, US), Austin (Male, UK), Derek (Male, UK), Beverly (Female, UK), carol (Female, UK) |
| SIMBA 1.6 |
| Patricia (Female, US), Robert (Male, US), Douglas (Male, US), Christina (Female, US), Austin (Male, UK), Derek (Male, UK), Beverly (Female, UK), Carol (Female, UK) |
| Sonic 3 |
| Kiefer (Male, US), Brandon (Male, US), Tessa (Female, US), Linda (Female, US), Fiona - Witty Woman (Female, UK), Redd (Female, UK), Tom (Male, UK), Benedict (Male, UK) |
| Sonic English (Oct 2024) |
| Nonfiction Man (Male, US), Newsman (Male, US), Helpful Woman (Female, US), Classy British Man (Male, UK), Southern Woman (Female, US), Polite Man (Male, UK), British Lady (Female, UK), British Narration Lady (Female, UK) |
| Speech 2.6 HD |
| Captivating Storyteller (Male, US), Insightful Speaker (Male, US), Captivating Female (Female, US), Upbeat Woman (Female, US), Graceful Lady (Female, UK), Patient Man (Male, UK), Expressive Narrator (Male, UK), Compelling Lady (Female, UK) |
| Speech 2.6 Turbo |
| Captivating Storyteller (Male, US), Insightful Speaker (Male, US), Captivating Female (Female, US), Upbeat Woman (Female, US), Patient Man (Male, UK), Expressive Narrator (Male, UK), Compelling Lady (Female, UK), Graceful Lady (Female, UK) |
| Speech 2.8 HD |
| Captivating Storyteller (Male, US), Insightful Speaker (Male, US), Captivating Female (Female, US), Upbeat Woman (Female, US), Graceful Lady (Female, UK), Patient Man (Male, UK), Expressive Narrator (Male, UK), Compelling Lady (Female, UK) |
| Speech 2.8 Turbo |
| Compelling Lady (Female, UK), Captivating Female (Female, US), Upbeat Woman (Female, US), Graceful Lady (Female, UK), Captivating Storyteller (Male, US), Insightful Speaker (Male, US), Patient Man (Male, UK), Expressive Narrator (Male, UK) |
| Speech-02-HD |
| English Sweet Female (Female, US), English Lively Male (Male, US), English Powerful Female (Female, UK), English_Magnetic_Male_2 (Male, UK) |
| Speech-02-Turbo |
| English Lively Male (Male, US), English Sweet Female (Female, US), English Magnetic Male (Male, UK), English Powerful Female (Female, UK) |
| Standard |
| en-US-Standard-F (Female, US), en-US-Standard-I (Male, US), en-US-Standard-A (Male, US), en-US-Standard-C (Female, US), en-GB-Standard-A (Female, UK), en-GB-Standard-D (Male, UK), en-GB-Standard-C (Female, UK), en-GB-Standard-B (Male, UK) |
| Step Audio EditX (Mar 2026) |
| HT-M319 (Male, US), HT-F404 (Female, US) |
| Step TTS 2 (Mar 2026) |
| Vibrant Youth (Male, US), Lively Girl (Female, US) |
| Studio |
| en-US-Studio-Q (Male, US), en-US-Studio-O (Female, US), en-GB-Studio-C (Female, UK), en-GB-Studio-B (Male, UK) |
| StyleTTS 2 |
| Susan (Female, US), Michael (Male, US), Abbey (Female, US), Alan (Male, US), Redd (Female, UK), Tom (Male, UK), Dave (Male, UK), Amy (Female, UK) |
| T2A-01-HD |
| Mature Boss (Female, US), English Steady Mentor (Female, US), Gentle-voiced_man (Male, US), English Whimsical Girl (Female, US), Decent Young Man (Male, UK), Wise Scholar (Male, UK), Female Narrator (Female, UK), Wise Lady (Female, UK) |
| T2A-01-Turbo |
| English Steady Mentor (Male, US), English Whimsical Girl (Female, US), English Gentle Voiced Man (Male, US), Boss Lady (Female, US), Wise Lady (Female, UK), English Wise Scholar (Male, UK), Decent Young Man (Male, UK), Anime Character (Female, UK) |
| TTS-1 |
| echo (Male, US), alloy (Female, US), nova (Female, US), shimmer (Female, US), onyx (Male, US), fable (Male, UK) |
| TTS-1 HD |
| nova (Female, US), alloy (Female, US), onyx (Male, US), shimmer (Female, US), echo (Male, US), fable (Male, UK) |
| Turbo v2.5 |
| Laura (Female, US), Jessica (Female, US), Jarnathan (Male, US), Liam (Male, US), Elizabeth (Female, UK), Shelley (Female, UK), Dan (Male, UK), Nathaniel (Male, UK) |
| VibeVoice 1.5B |
| Maya (Female, US), Alice (Female, US), Carter (Male, US), Frank (Male, US) |
| VibeVoice 7B |
| Alice (Female, US), Maya (Female, US), Carter (Male, US), Frank (Male, US) |
| Voxtral TTS |
| Paul (Male, US), Oliver (Male, UK), Jane (Female, UK) |
| WaveNet |
| en-US-Wavenet-I (Male, US), en-US-Wavenet-B (Male, US), en-US-Wavenet-C (Female, US), en-US-Wavenet-F (Female, US), en-GB-Wavenet-B (Male, UK), en-GB-Wavenet-D (Male, UK), en-GB-Wavenet-C (Female, UK), en-GB-Wavenet-A (Female, UK) |
| XTTS v2 |
| Susan (Female, US), Michael (Male, US), Abbey (Female, US), Alan (Male, US), Redd (Female, UK), Tom (Male, UK), Dave (Male, UK), Amy (Female, UK) |
| Zonos-v0.1 |
| Abbey (Female, US), Susan (Female, US), Alan (Male, US), Michael (Male, US), Dave (Male, UK), Tom (Male, UK), Amy (Female, UK), Redd (Female, UK) |
| xAI Text to Speech |
| Eve (Female, US), Ara (Female, US), Rex (Male, US), Leo (Male, US) |
Model and Provider Inclusion Criteria
Our objective is to analyze and compare popular and high-performing Text to Speech models and providers to support end-users in choosing which to use. As such, we apply an 'industry significance' and competitive performance test to evaluate the inclusion of new models and providers. We are in the process of refining these criteria and welcome any feedback and suggestions. To suggest models or providers, please contact us via the contact page.
Benchmarking is conducted 4 times daily at random times each day. For each benchmarking evlauation we select a single random voice for each model. A unique prompt of ~500 characters is used for each generation.
Statement of Independence
Benchmarking is conducted with strict independence and objectivity. No compensation is received from any providers for listing or favorable outcomes on Artificial Analysis.