Multilingual AI Model Benchmark Compare Leading LLMs by Language
Explore how leading large language models (LLMs) perform across multiple languages on Artificial Analysis' Multilingual Index, including the Global-MMLU-Lite benchmark. Filter by language and model, view trade-offs between accuracy, speed, and cost, and find the best LLM for your multilingual use case.
For details on datasets and methodology, see the FAQ page.
๐All (average)
Top Models#1
Gemini 3.1 Pro Preview93#2
Gemini 3 Pro Preview (high)92#3
Claude Opus 4.6 (max)92#4
Gemini 3 Flash91#5
Claude Opus 4.591
๐ฌ๐งEnglish
Top Models#1
Claude Opus 4.6 (max)95#2
Gemini 3.1 Pro Preview95#3
Gemini 3 Flash95#4
Claude Opus 4.594#5
GPT-5.1 (high)94
๐จ๐ณChinese
Top Models#1
Claude Opus 4.6 (max)94#2
Gemini 3.1 Pro Preview94#3
Gemini 3 Pro Preview (high)94#4
Gemini 3 Flash93#5
Claude Opus 4.592
๐ฎ๐ณHindi
Top Models#1
Gemini 3.1 Pro Preview94#2
Claude Opus 4.6 (max)92#3
Gemini 3 Pro Preview (high)91#4
GPT-5 (high)91#5
Claude Opus 4.591
๐ช๐ธSpanish
Top Models#1
Gemini 3.1 Pro Preview94#2
Gemini 3 Pro Preview (high)94#3
Gemini 3 Flash94#4
Claude Opus 4.6 (max)94#5
Claude Opus 4.594
๐ซ๐ทFrench
Top Models#1
Gemini 3.1 Pro Preview94#2
GPT-5.1 (high)94#3
Gemini 3 Pro Preview (high)93#4
Claude Opus 4.6 (max)93#5
GPT-5 (high)93
Other Countries
๐ธ๐ฆ Arabic
Gemini 3.1 Pro Preview๐ง๐ฉ Bengali
Gemini 3.1 Pro Preview๐ต๐น Portuguese
Gemini 3.1 Pro Preview๐ฎ๐ฉ Indonesian
Gemini 3.1 Pro Preview๐ฏ๐ต Japanese
Gemini 3.1 Pro Preview๐ฐ๐ช Swahili
Gemini 3.1 Pro Preview๐ฉ๐ช German
Gemini 3.1 Pro Preview๐ฐ๐ท Korean
Claude Opus 4.6 (max)๐ฎ๐น Italian
Gemini 3.1 Pro Preview๐ณ๐ฌ Yoruba
Gemini 3.1 Pro Preview๐ฒ๐ฒ Burmese
Gemini 3.1 Pro Preview
Overview
Artificial Analysis Multilingual Index
Higher is better
All (average)
Chinese
English
Hindi
Spanish
Reasoning models are indicated by a lightbulb icon.
Multilingual Index Across Languages (Normalized)
Scores are normalized per language across all models tested, where green represents the highest score for that language and red represents the lowest score for that language.
Reasoning models are indicated by a lightbulb icon
Multilingual Index
Multilingual Index: Average Across All Languages
Artificial Analysis Multilingual Index ยท Average across all languages ยท Higher is better
Reasoning models are indicated by a lightbulb icon.
Multilingual Index: Average vs. Output Speed
Artificial Analysis Multilingual Index ยท 1,000 input tokens
Most attractive quadrant
Anthropic
Meta
MiniMax
Mistral
OpenAI
Reasoning models are indicated by a lightbulb icon.
Multilingual Index: Average vs. Price
Artificial Analysis Multilingual Index ยท Average across all languages
Most attractive quadrant
Anthropic
DeepSeek
Google
Meta
MiniMax
Mistral
OpenAI
xAI
Reasoning models are indicated by a lightbulb icon.
Global-MMLU-Lite
Multilingual Global-MMLU-Lite: Average
Average across all languages ยท Higher is better
Reasoning models are indicated by a lightbulb icon.
Pricing
Pricing: Cache Hit, Input, and Output
Price (USD per M Tokens)
Reasoning models are indicated by a lightbulb icon
Speed & Latency
Output Speed
Output tokens per second ยท Higher is better
Reasoning models are indicated by a lightbulb icon
Latency: Time To First Answer Token
Seconds to first answer token received ยท Accounts for reasoning model 'thinking' time
Reasoning models are indicated by a lightbulb icon
End-to-End Response Time
Seconds to output 500 tokens, including reasoning model 'thinking' time ยท Lower is better
Reasoning models are indicated by a lightbulb icon