Step TTS 2 Quality ELO, Speed & Price Analysis
Analysis of the Step TTS 2 model by StepFun and comparison to other Text to Speech models across key metrics including quality ELO, speed, and price.
For further details, see our methodology page.
Quality
Text to Speech Arena Quality ELO
Relative ELO score of the models as determined by responses from users in Artificial Analysis' Speech Arena. Some models may not be shown due to not yet having enough votes.
Pricing
Price
Price per 1M characters of text. For detail on how we calculate price for providers which price based on inference time or subscription plans, see our methodology page.
Speed Factor
Characters Per Second
Number of characters processed per second of generation time. Higher values indicate faster generation speeds.
Text to Speech models & providers compared
AsyncFlow V2, Azure Neural, Chirp 3: HD, ElevenLabs v3, Fish Speech 1.5, Flash v2.5, Inworld TTS 1, Inworld TTS 1 Max, Journey, Kokoro 82M v1.0, LMNT, Magpie Multilingual, MetaVoice v1, Multilingual v2, Murf Speech Gen 2, Neural2, Octave 2, Octave TTS, OpenAudio S1, OpenVoice v2, Polly Generative, Polly Long-Form, Polly Neural, Polly Standard, Qwen3 TTS Flash, Simba, Sonic 3, Sonic English (Oct '24), Speech 2.6 HD, Speech 2.6 Turbo, Speech-02-HD, Speech-02-Turbo, Standard, Step TTS 2, Step TTS Mini, Studio, StyleTTS 2, T2A-01-HD, T2A-01-Turbo, TTS-1, TTS-1 HD, Turbo v2.5, VibeVoice 7B, WaveNet, XTTS v2, Zonos-v0.1.