Question 1

Which is the best Text to Speech AI model?

Accepted Answer

Gemini 3.1 Flash TTS currently leads the Text to Speech Arena with an Elo score of 1215.

Question 2

What are the top Text to Speech models?

Accepted Answer

The top Text to Speech models by Elo rating are: 1. Gemini 3.1 Flash TTS (Elo 1215), 2. Sonic 3.5 (Elo 1209), 3. Fun-Realtime-TTS (Elo 1206), 4. Realtime TTS-2 - Research Preview (Elo 1201), 5. Realtime TTS 1.5 Max (Elo 1199). Rankings are based on blind user votes in the Speech Arena.

Question 3

How are Text to Speech models ranked on this leaderboard?

Accepted Answer

Models are ranked using an Elo rating system derived from user votes in blind comparisons in the Speech Arena. Users listen to pairs of speech samples generated from the same text and choose which sounds more natural. Higher Elo scores indicate a model produces speech preferred more often by listeners.

Question 4

Which Text to Speech model is the cheapest?

Accepted Answer

Kokoro 82M v1.0 is the most affordable at $0.65 per 1M characters with an Elo score of 1059. Other affordable options include StyleTTS 2 at $2.82 per 1M characters.

Question 5

Which is the best open weights Text to Speech model?

Accepted Answer

Step Audio EditX (Mar 2026) is the highest-ranked open weights model on the Text to Speech Leaderboard with an Elo score of 1114. There are 15 open weights models out of 88 total.

Question 6

What are the top 5 open weights Text to Speech models?

Accepted Answer

The top 5 open weights Text to Speech models are: 1. Step Audio EditX (Mar 2026) (Elo 1114), 2. Fish Audio S2 Pro (Elo 1110), 3. Voxtral TTS (Elo 1076), 4. Kokoro 82M v1.0 (Elo 1059), 5. Maya1 (Elo 1051).

Question 7

What categories and accents can I filter by?

Accepted Answer

You can filter by the following categories: Knowledge Sharing, Assistants, Entertainment, and Customer Service, and the following accents: US and UK.

Speech Arena LeaderboardArtificial Analysis

Frequently Asked Questions