Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis
All articles

April 3, 2026

MAI-Transcribe-1: Everything you need to know

Microsoft has released MAI-Transcribe-1: a speech transcription model achieving 3.0% on AA-WER (#4), and is fast at 69x realtime

The model was developed by Microsoft AI (MAI)’s Superintelligence team and supports 25 languages including English, French, Arabic, Japanese, and Chinese. MAI-Transcribe-1 API is currently available in public preview via Azure Speech on Microsoft Foundry.

On the Artificial Analysis Speech to Text (STT) leaderboard, MAI-Transcribe-1 achieves a 3.0% word error rate on AA-WER for speech transcription accuracy, positioning it 4th overall behind Mistral’s Voxtral Small (2.9% AA-WER), Google’s Gemini 3.1 Pro High (2.9% AA-WER) and ElevenLabs’ Scribe v2 (2.3% AA-WER). It also stands out as one of the faster high-accuracy transcription models available, processing audio at ~69x real-time.

On speed, MAI-Transcribe-1 transcribes approximately 69 seconds of audio per second of processing, making it the fastest model in the top 5 by accuracy.

MAI-Transcribe-1 is available at $6 per 1000 minutes of audio via Microsoft Foundry.

See full results on the Artificial Analysis Speech to Text leaderboard: https://artificialanalysis.ai/speech-to-text