Multilingual AI Model Benchmark Compare Leading LLMs by Language
Explore how leading large language models (LLMs) perform across multiple languages on Artificial Analysis' Multilingual Index, including the Global-MMLU-Lite benchmark. Filter by language and model, view trade-offs between accuracy, speed, and cost, and find the best LLM for your multilingual use case.
For details on datasets and methodology, see the FAQ page.
🌐All (average)
Top Models#1
Gemini 3.1 Pro Preview93#2
Gemini 3 Pro Preview (high)92#3
Claude Opus 4.6 (max)92#4
Gemini 3 Flash91#5
Claude Opus 4.591
🇬🇧English
Top Models#1
Claude Opus 4.6 (max)95#2
Gemini 3.1 Pro Preview95#3
Gemini 3 Flash95#4
Claude Opus 4.594#5
GPT-5.1 (high)94
🇨🇳Chinese
Top Models#1
Claude Opus 4.6 (max)94#2
Gemini 3.1 Pro Preview94#3
Gemini 3 Pro Preview (high)94#4
Gemini 3 Flash93#5
Claude Opus 4.592
🇮🇳Hindi
Top Models#1
Gemini 3.1 Pro Preview94#2
Claude Opus 4.6 (max)92#3
Gemini 3 Pro Preview (high)91#4
GPT-5 (high)91#5
Claude Opus 4.591
🇪🇸Spanish
Top Models#1
Gemini 3.1 Pro Preview94#2
Gemini 3 Pro Preview (high)94#3
Gemini 3 Flash94#4
Claude Opus 4.6 (max)94#5
Claude Opus 4.594
🇫🇷French
Top Models#1
Gemini 3.1 Pro Preview94#2
GPT-5.1 (high)94#3
Gemini 3 Pro Preview (high)93#4
Claude Opus 4.6 (max)93#5
GPT-5 (high)93
Other Countries
🇸🇦 Arabic
Gemini 3.1 Pro Preview🇧🇩 Bengali
Gemini 3.1 Pro Preview🇵🇹 Portuguese
Gemini 3.1 Pro Preview🇮🇩 Indonesian
Gemini 3.1 Pro Preview🇯🇵 Japanese
Gemini 3.1 Pro Preview🇰🇪 Swahili
Gemini 3.1 Pro Preview🇩🇪 German
Gemini 3.1 Pro Preview🇰🇷 Korean
Claude Opus 4.6 (max)🇮🇹 Italian
Gemini 3.1 Pro Preview🇳🇬 Yoruba
Gemini 3.1 Pro Preview🇲🇲 Burmese
Gemini 3.1 Pro Preview
Overview
Artificial Analysis Multilingual Index
Higher is better
All (average)
Chinese
English
Hindi
Spanish
Reasoning models are indicated by a lightbulb icon.
Multilingual Index Across Languages (Normalized)
Scores are normalized per language across all models tested, where green represents the highest score for that language and red represents the lowest score for that language.
Reasoning models are indicated by a lightbulb icon
Multilingual Index
Multilingual Index: Average Across All Languages
Artificial Analysis Multilingual Index; Average across all languages; Higher is better
Reasoning models are indicated by a lightbulb icon.
Multilingual Index: Average vs. Output Speed
Artificial Analysis Multilingual Index; 1,000 input tokens
Most attractive quadrant
Anthropic
DeepSeek
Google
Meta
MiniMax
Mistral
OpenAI
xAI
Reasoning models are indicated by a lightbulb icon.
Multilingual Index: Average vs. Price
Artificial Analysis Multilingual Index; Average across all languages
Most attractive quadrant
Anthropic
DeepSeek
Google
Meta
MiniMax
Mistral
OpenAI
xAI
Reasoning models are indicated by a lightbulb icon.
Global-MMLU-Lite
Multilingual Global-MMLU-Lite: Average
Average across all languages; Higher is better
Reasoning models are indicated by a lightbulb icon.
Pricing
Pricing: Input and Output Prices
Price: USD per 1M Tokens
Input price
Output price
Reasoning models are indicated by a lightbulb icon.
Speed & Latency
Output Speed
Output Tokens per Second; Higher is better
Reasoning models are indicated by a lightbulb icon.
Latency: Time To First Answer Token
Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time
Input processing
Thinking (reasoning models, when applicable)
Reasoning models are indicated by a lightbulb icon.
End-to-End Response Time
Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better
Input processing time
'Thinking' time (reasoning models)
Outputting time
Reasoning models are indicated by a lightbulb icon.