Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Multilingual AI Model Benchmark Compare Leading LLMs by Language

Last updated: January 28, 2026

Explore how leading large language models (LLMs) perform across multiple languages on Artificial Analysis' Multilingual Index, including the Global-MMLU-Lite benchmark. Filter by language and model, view trade-offs between accuracy, speed, and cost, and find the best LLM for your multilingual use case.

For details on datasets and methodology, see the FAQ page.

Artificial Analysis Multilingual Index

Higher is better
All (average)
Chinese
English
Hindi
Spanish

An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.

Multilingual Index Across Languages (Normalized)

Scores are normalized per language across all models tested, where green represents the highest score for that language and red represents the lowest score for that language.
Reasoning model

An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.

{"@context":"https://schema.org","@type":"Dataset","name":"Multilingual Index Across Languages (Normalized)","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Scores are normalized per language across all models tested, where green represents the highest score for that language and red represents the lowest score for that language.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"label,multilingual,detailsUrl\nGemini 3 Pro Preview (high),[object Object],/models/gemini-3-pro/providers\nGemini 3 Flash,[object Object],/models/gemini-3-flash-reasoning/providers\nClaude Opus 4.5,[object Object],/models/claude-opus-4-5-thinking/providers\nGPT-5 (high),[object Object],/models/gpt-5/providers\nClaude Opus 4.5,[object Object],/models/claude-opus-4-5/providers\nGPT-5.1 (high),[object Object],/models/gpt-5-1/providers\nGPT-5 (medium),[object Object],/models/gpt-5-medium/providers\nGemini 2.5 Pro,[object Object],/models/gemini-2-5-pro/providers\nGPT-5.2 (medium),[object Object],/models/gpt-5-2-medium/providers\nGPT-5.1 Codex (high),[object Object],/models/gpt-5-1-codex/providers\nGrok 4,[object Object],/models/grok-4/providers\nClaude 4.5 Sonnet,[object Object],/models/claude-4-5-sonnet-thinking/providers\nGemini 2.5 Flash (Sep),[object Object],/models/gemini-2-5-flash-preview-09-2025-reasoning/providers\nDeepSeek V3.2 Speciale,[object Object],/models/deepseek-v3-2-speciale/providers\nGPT-5 mini (high),[object Object],/models/gpt-5-mini/providers\nGPT-5.2,[object Object],/models/gpt-5-2-non-reasoning/providers\nDeepSeek V3.2 Exp,[object Object],/models/deepseek-v3-2-reasoning-0925/providers\nDeepSeek V3.2,[object Object],/models/deepseek-v3-2-reasoning/providers\nDeepSeek R1 0528,[object Object],/models/deepseek-r1/providers\nGrok 4 Fast,[object Object],/models/grok-4-fast-reasoning/providers\nGrok 4.1 Fast,[object Object],/models/grok-4-1-fast-reasoning/providers\nGLM-4.6,[object Object],/models/glm-4-6-reasoning/providers\nQwen3 Max Thinking (Preview),[object Object],/models/qwen3-max-thinking-preview/providers\nQwen3 235B A22B 2507,[object Object],/models/qwen3-235b-a22b-instruct-2507-reasoning/providers\nGPT-5.1 Codex mini (high),[object Object],/models/gpt-5-1-codex-mini/providers\nDoubao-Seed-1.8,[object Object],/models/doubao-seed-1-8/providers\nMiMo-V2-Flash,[object Object],/models/mimo-v2-flash-reasoning/providers\nGPT-5 (minimal),[object Object],/models/gpt-5-minimal/providers\nMiniMax-M2.1,[object Object],/models/minimax-m2-1/providers\nGemini 2.5 Flash-Lite (Sep),[object Object],/models/gemini-2-5-flash-lite-preview-09-2025-reasoning/providers\nClaude 4.5 Haiku,[object Object],/models/claude-4-5-haiku-reasoning/providers\nQwen3 Max,[object Object],/models/qwen3-max/providers\ngpt-oss-120B (high),[object Object],/models/gpt-oss-120b/providers\nDeepSeek V3.1 Terminus,[object Object],/models/deepseek-v3-1-terminus-reasoning/providers\nDeepSeek V3.2,[object Object],/models/deepseek-v3-2/providers\nLlama 4 Maverick,[object Object],/models/llama-4-maverick/providers\nQwen3 Next 80B A3B,[object Object],/models/qwen3-next-80b-a3b-reasoning/providers\nKAT-Coder-Pro V1,[object Object],/models/kat-coder-pro-v1/providers\nMagistral Medium 1.2,[object Object],/models/magistral-medium-2509/providers\nGPT-5 nano (high),[object Object],/models/gpt-5-nano/providers\nNova 2.0 Omni (low),[object Object],/models/nova-2-0-omni-reasoning-low/providers\nNova 2.0 Omni (medium),[object Object],/models/nova-2-0-omni-reasoning-medium/providers\nLlama Nemotron Super 49B v1.5,[object Object],/models/llama-nemotron-super-49b-v1-5-reasoning/providers\nMiniMax-M2,[object Object],/models/minimax-m2/providers\nKimi K2 0905,[object Object],/models/kimi-k2-0905/providers\nGLM-4.6V,[object Object],/models/glm-4-6v-reasoning/providers\nINTELLECT-3,[object Object],/models/intellect-3/providers\nMistral Large 3,[object Object],/models/mistral-large-3/providers\ngpt-oss-120B (low),[object Object],/models/gpt-oss-120b-low/providers\nGLM-4.7,[object Object],/models/glm-4-7-non-reasoning/providers\nRing-1T,[object Object],/models/ring-1t/providers\nDoubao Seed Code,[object Object],/models/doubao-seed-code/providers\nNova Premier,[object Object],/models/nova-premier/providers\nSolar Pro 2,[object Object],/models/solar-pro-2-reasoning/providers\nK-EXAONE,[object Object],/models/k-exaone/providers\nNova 2.0 Pro Preview,[object Object],/models/nova-2-0-pro/providers\nK2-V2 (high),[object Object],/models/k2-v2/providers\nSeed-OSS-36B-Instruct,[object Object],/models/seed-oss-36b-instruct/providers\nDevstral 2,[object Object],/models/devstral-2/providers\nK2-V2 (medium),[object Object],/models/k2-v2-medium/providers\nLlama 3.3 70B,[object Object],/models/llama-3-3-instruct-70b/providers\nGemma 3 27B,[object Object],/models/gemma-3-27b/providers\nMagistral Small 1.2,[object Object],/models/magistral-small-2509/providers\nSolar Pro 2,[object Object],/models/solar-pro-2/providers\ngpt-oss-20B (high),[object Object],/models/gpt-oss-20b/providers\nGLM-4.6V,[object Object],/models/glm-4-6v/providers\nMi:dm K 2.5 Pro,[object Object],/models/mi-dm-k-2-5-pro-dec28/providers\nLlama 4 Scout,[object Object],/models/llama-4-scout/providers\nKimi K2 Thinking,[object Object],/models/kimi-k2-thinking/providers\nEXAONE 4.0 32B,[object Object],/models/exaone-4-0-32b-reasoning/providers\ngpt-oss-20B (low),[object Object],/models/gpt-oss-20b-low/providers\nNova 2.0 Omni,[object Object],/models/nova-2-0-omni/providers\nMistral Small 3.2,[object Object],/models/mistral-small-3-2/providers\nQwen3 30B A3B 2507,[object Object],/models/qwen3-30b-a3b-2507/providers\nOlmo 3.1 32B Think,[object Object],/models/olmo-3-1-32b-think/providers\nK2-V2 (low),[object Object],/models/k2-v2-low/providers\nK-EXAONE,[object Object],/models/k-exaone-non-reasoning/providers\nGemma 3 12B,[object Object],/models/gemma-3-12b/providers\nApriel-v1.5-15B-Thinker,[object Object],/models/apriel-v1-5-15b-thinker/providers\nDevstral Small 2,[object Object],/models/devstral-small-2/providers\nMinistral 3 14B,[object Object],/models/ministral-3-14b/providers\nMinistral 3 8B,[object Object],/models/ministral-3-8b/providers\nMiMo-V2-Flash,[object Object],/models/mimo-v2-flash/providers\nNVIDIA Nemotron 3 Nano,[object Object],/models/nvidia-nemotron-3-nano-30b-a3b/providers\nMinistral 3 3B,[object Object],/models/ministral-3-3b/providers\nGemma 3 4B,[object Object],/models/gemma-3-4b/providers\nGemma 3 1B,[object Object],/models/gemma-3-1b/providers\nLlama 3.1 8B,[object Object],/models/llama-3-1-instruct-8b/providers\nLFM2 2.6B,[object Object],/models/lfm2-2-6b/providers"}

Multilingual Index: Average Across All Languages

Artificial Analysis Multilingual Index; Average across all languages; Higher is better

An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.

Multilingual Index: Average vs. Output Speed

Artificial Analysis Multilingual Index; 1,000 input tokens
Most attractive quadrant
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
OpenAI
xAI

There is a trade-off between model quality and output speed, with higher intelligence models typically having lower output speed.

An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Multilingual Index: Average vs. Price

Artificial Analysis Multilingual Index; Average across all languages
Most attractive quadrant
Anthropic
DeepSeek
Google
Meta
MiniMax
Mistral
OpenAI
xAI

While higher intelligence models are typically more expensive, they do not all follow the same price-quality curve.

An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Multilingual Global-MMLU-Lite: Average

Average across all languages; Higher is better

A multilingual version of Massive Multitask Language Understanding, evaluated across multiple languages. Tests general knowledge and reasoning ability in areas like science, humanities, mathematics and more. See methodology for further details.

Pricing: Input and Output Prices

Price: USD per 1M Tokens
Input price
Output price

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Output Speed

Output Tokens per Second; Higher is better

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

{"@context":"https://schema.org","@type":"Dataset","name":"Output Speed","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Output Tokens per Second; Higher is better","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Latency: Time To First Answer Token

Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time
Input processing
Thinking (reasoning models, when applicable)

Time to first answer token received, in seconds, after API request sent. For reasoning models, this includes the 'thinking' time of the model before providing an answer. For models which do not support streaming, this represents time to receive the completion.

End-to-End Response Time

Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better
'Thinking' time (reasoning models)
Input processing time
Outputting time

Seconds to receive a 500 token response. Key components:

  • Input time: Time to receive the first response token
  • Thinking time (only for reasoning models): Time reasoning models spend outputting tokens to reason prior to providing an answer. Amount of tokens based on the average reasoning tokens across a diverse set of 60 prompts (methodology details).
  • Answer time: Time to generate 500 output tokens, based on output speed

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).