Multilingual AI Model Benchmark Compare Leading LLMs by Language
Top models 🌐All (average)
#1
Gemini 3.1 Pro Preview
93
#2
Gemini 3 Pro Preview (high)
92
#3
Claude Opus 4.6 (max)
92
#4
Gemini 3 Flash
91
#5
Claude Opus 4.5
91
Top models 🇬🇧English
#1
Claude Opus 4.6 (max)
95
#2
Gemini 3.1 Pro Preview
95
#3
Gemini 3 Flash
95
#4
Claude Opus 4.5
94
#5
GPT-5.1 (high)
94
Top models 🇨🇳Chinese
#1
Claude Opus 4.6 (max)
94
#2
Gemini 3.1 Pro Preview
94
#3
Gemini 3 Pro Preview (high)
94
#4
Gemini 3 Flash
93
#5
Claude Opus 4.5
92
Top models 🇮🇳Hindi
#1
Gemini 3.1 Pro Preview
94
#2
Claude Opus 4.6 (max)
92
#3
Gemini 3 Pro Preview (high)
91
#4
GPT-5 (high)
91
#5
Claude Opus 4.5
91
Top models 🇪🇸Spanish
#1
Gemini 3.1 Pro Preview
94
#2
Gemini 3 Pro Preview (high)
94
#3
Gemini 3 Flash
94
#4
Claude Opus 4.6 (max)
94
#5
Claude Opus 4.5
94
Top models 🇫🇷French
#1
Gemini 3.1 Pro Preview
94
#2
GPT-5.1 (high)
94
#3
Gemini 3 Pro Preview (high)
93
#4
Claude Opus 4.6 (max)
93
#5
GPT-5 (high)
93
Top models in other languages
🇸🇦 Arabic
🇧🇩 Bengali
🇵🇹 Portuguese
🇮🇩 Indonesian
🇯🇵 Japanese
🇰🇪 Swahili
🇩🇪 German
🇰🇷 Korean
🇮🇹 Italian
🇳🇬 Yoruba
🇲🇲 Burmese
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview
Claude Opus 4.6 (max)
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview
Explore how leading large language models (LLMs) perform across multiple languages on Artificial Analysis' Multilingual Index, including the Global-MMLU-Lite benchmark. Filter by language and model, view trade-offs between accuracy, speed, and cost, and find the best LLM for your multilingual use case.
For details on datasets and methodology, see the FAQ page.
Artificial Analysis Multilingual Index
Higher is better
All (average)
Chinese
English
Hindi
Spanish
Reasoning models are indicated by a lightbulb icon.
Multilingual Index Across Languages (Normalized)
Scores are normalized per language across all models tested, where green represents the highest score for that language and red represents the lowest score for that language.
Reasoning model
Multilingual Index: Average Across All Languages
Artificial Analysis Multilingual Index; Average across all languages; Higher is better
Reasoning models are indicated by a lightbulb icon.
Multilingual Index: Average vs. Output Speed
Artificial Analysis Multilingual Index; 1,000 input tokens
Most attractive quadrant
Anthropic
DeepSeek
Google
Meta
MiniMax
Mistral
OpenAI
xAI
Reasoning models are indicated by a lightbulb icon.
Multilingual Index: Average vs. Price
Artificial Analysis Multilingual Index; Average across all languages
Most attractive quadrant
Anthropic
DeepSeek
Google
Meta
MiniMax
Mistral
OpenAI
xAI
Reasoning models are indicated by a lightbulb icon.
Multilingual Global-MMLU-Lite: Average
Average across all languages; Higher is better
Reasoning models are indicated by a lightbulb icon.
Pricing: Input and Output Prices
Price: USD per 1M Tokens
Input price
Output price
Reasoning models are indicated by a lightbulb icon.
Output Speed
Output Tokens per Second; Higher is better
Reasoning models are indicated by a lightbulb icon.
Latency: Time To First Answer Token
Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time
Input processing
Thinking (reasoning models, when applicable)
Reasoning models are indicated by a lightbulb icon.
End-to-End Response Time
Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better
Input processing time
'Thinking' time (reasoning models)
Outputting time
Reasoning models are indicated by a lightbulb icon.