Speech Arena LeaderboardArtificial Analysis

Compare TTS models using each provider's own native voices.

Frequently Asked Questions

Sonic 3.5 currently leads the Text to Speech Arena with an Elo score of 1216.

The top Text to Speech models by Elo rating are: 1. Sonic 3.5 (Elo 1216), 2. Gemini 3.1 Flash TTS (Elo 1216), 3. Fun-Realtime-TTS (Elo 1211), 4. Realtime TTS-2 - Research Preview (Elo 1202), 5. xAI Text to Speech (Elo 1197). Rankings are based on blind user votes in the Speech Arena.

Models are ranked using an Elo rating system derived from user votes in blind comparisons in the Speech Arena. Users listen to pairs of speech samples generated from the same text and choose which sounds more natural. Higher Elo scores indicate a model produces speech preferred more often by listeners. Vote in the Speech Arena

Kokoro 82M v1.0 is the most affordable at $0.65 per 1M characters with an Elo score of 1064. Other affordable options include StyleTTS 2 at $2.82 per 1M characters.

Fish Audio S2 Pro is the highest-ranked open weights model on the Text to Speech Leaderboard with an Elo score of 1115. There are 15 open weights models out of 85 total.

The top 5 open weights Text to Speech models are: 1. Fish Audio S2 Pro (Elo 1115), 2. Step Audio EditX (Mar 2026) (Elo 1115), 3. Voxtral TTS (Elo 1076), 4. Kokoro 82M v1.0 (Elo 1064), 5. Magpie-Multilingual 357M (Feb 2026) (Elo 1059).

You can filter by the following categories: Knowledge Sharing, Assistants, Entertainment, and Customer Service, and the following accents: US and UK.