MAI-Transcribe-1: Everything you need to know

Microsoft has released MAI-Transcribe-1: a speech transcription model achieving 3.0% on AA-WER (#4), and is fast at 69x realtime

The model was developed by Microsoft AI (MAI)’s Superintelligence team and supports 25 languages including English, French, Arabic, Japanese, and Chinese. MAI-Transcribe-1 API is currently available in public preview via Azure Speech on Microsoft Foundry.

On the Artificial Analysis Speech to Text (STT) leaderboard, MAI-Transcribe-1 achieves a 3.0% word error rate on AA-WER for speech transcription accuracy, positioning it 4th overall behind Mistral’s Voxtral Small (2.9% AA-WER), Google’s Gemini 3.1 Pro High (2.9% AA-WER) and ElevenLabs’ Scribe v2 (2.3% AA-WER). It also stands out as one of the faster high-accuracy transcription models available, processing audio at ~69x real-time.

On speed, MAI-Transcribe-1 transcribes approximately 69 seconds of audio per second of processing, making it the fastest model in the top 5 by accuracy.

MAI-Transcribe-1 is available at $6 per 1000 minutes of audio via Microsoft Foundry.

See full results on the Artificial Analysis Speech to Text leaderboard: https://artificialanalysis.ai/speech-to-text

MAI-Transcribe-1: Everything you need to know

Read the latest

DeepSeek V4 Flash 0731 scores 50 on the Artificial Analysis Intelligence Index, 10 points above previous DeepSeek V4 Flash

Inkling Small lands within a point of Inkling on the Artificial Analysis Intelligence Index with less than a third of the parameters

Agnes AI releases Agnes 2.5 Pro Alpha