All articles

June 2, 2026

Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier

Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier

MAI-Transcribe-1.5 is Microsoft AI (MAI)’s latest speech transcription model, coming in at 3rd overall on the on the Artificial Analysis Word Error Rate (AA-WER) leaderboard, behind Alibaba’s Fun-Realtime-ASR-preview (1.7% WER), and ElevenLabs Scribe v2 (2.2% WER). The model stands out as the fastest STT model in the top 10 for accuracy, processing audio at ~276x real-time - this is more than double the speed of the second fastest model in the top 10 for accuracy.

The new model supports keyword biasing (improved recognition of rarer vocabulary such as names and medical terminology), in addition to support for 43 languages including English, French, Arabic, Japanese, and Chinese.

See more details below ⬇️

MAI-Transcribe-1.5 ranks 2nd on VoxPopuli-Cleaned-AA (1.6% WER), 4th on Earnings22-Cleaned-AA (4.0% WER), and 5th on AA-AgentTalk (2.0% WER).

MAI-Transcribe-1.5 is the fastest model in the top 10 models for accuracy, leading the accuracy-speed Pareto frontier with a speed factor of ~276x.

MAI-Transcribe-1.5 is available at $6 per 1,000 minutes of audio via Microsoft Foundry.

See full results on the Artificial Analysis Speech to Text leaderboard: https://artificialanalysis.ai/speech-to-text

See our new Streaming Speech to Text leaderboard: https://artificialanalysis.ai/speech-to-text/streaming