Multilingual AI Model Benchmark Compare Leading LLMs by Language
Last updated: January 28, 2026
Top models 🌐All (average)
Top models 🇬🇧English
Top models 🇨🇳Chinese
Top models 🇮🇳Hindi
Top models 🇪🇸Spanish
Top models 🇫🇷French
Top models in other languages
Explore how leading large language models (LLMs) perform across multiple languages on Artificial Analysis' Multilingual Index, including the Global-MMLU-Lite benchmark. Filter by language and model, view trade-offs between accuracy, speed, and cost, and find the best LLM for your multilingual use case.
For details on datasets and methodology, see the FAQ page.
Artificial Analysis Multilingual Index
An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.
Multilingual Index Across Languages (Normalized)
An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.
{"@context":"https://schema.org","@type":"Dataset","name":"Multilingual Index Across Languages (Normalized)","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Scores are normalized per language across all models tested, where green represents the highest score for that language and red represents the lowest score for that language.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"label,multilingual,detailsUrl\nGemini 3 Pro Preview (high),[object Object],/models/gemini-3-pro/providers\nGemini 3 Flash,[object Object],/models/gemini-3-flash-reasoning/providers\nClaude Opus 4.5,[object Object],/models/claude-opus-4-5-thinking/providers\nGPT-5 (high),[object Object],/models/gpt-5/providers\nClaude Opus 4.5,[object Object],/models/claude-opus-4-5/providers\nGPT-5.1 (high),[object Object],/models/gpt-5-1/providers\nGPT-5 (medium),[object Object],/models/gpt-5-medium/providers\nGemini 2.5 Pro,[object Object],/models/gemini-2-5-pro/providers\nGPT-5.2 (medium),[object Object],/models/gpt-5-2-medium/providers\nGPT-5.1 Codex (high),[object Object],/models/gpt-5-1-codex/providers\nGrok 4,[object Object],/models/grok-4/providers\nClaude 4.5 Sonnet,[object Object],/models/claude-4-5-sonnet-thinking/providers\nGemini 2.5 Flash (Sep),[object Object],/models/gemini-2-5-flash-preview-09-2025-reasoning/providers\nDeepSeek V3.2 Speciale,[object Object],/models/deepseek-v3-2-speciale/providers\nGPT-5 mini (high),[object Object],/models/gpt-5-mini/providers\nGPT-5.2,[object Object],/models/gpt-5-2-non-reasoning/providers\nDeepSeek V3.2 Exp,[object Object],/models/deepseek-v3-2-reasoning-0925/providers\nDeepSeek V3.2,[object Object],/models/deepseek-v3-2-reasoning/providers\nDeepSeek R1 0528,[object Object],/models/deepseek-r1/providers\nGrok 4 Fast,[object Object],/models/grok-4-fast-reasoning/providers\nGrok 4.1 Fast,[object Object],/models/grok-4-1-fast-reasoning/providers\nGLM-4.6,[object Object],/models/glm-4-6-reasoning/providers\nQwen3 Max Thinking (Preview),[object Object],/models/qwen3-max-thinking-preview/providers\nQwen3 235B A22B 2507,[object Object],/models/qwen3-235b-a22b-instruct-2507-reasoning/providers\nGPT-5.1 Codex mini (high),[object Object],/models/gpt-5-1-codex-mini/providers\nDoubao-Seed-1.8,[object Object],/models/doubao-seed-1-8/providers\nMiMo-V2-Flash,[object Object],/models/mimo-v2-flash-reasoning/providers\nGPT-5 (minimal),[object Object],/models/gpt-5-minimal/providers\nMiniMax-M2.1,[object Object],/models/minimax-m2-1/providers\nGemini 2.5 Flash-Lite (Sep),[object Object],/models/gemini-2-5-flash-lite-preview-09-2025-reasoning/providers\nClaude 4.5 Haiku,[object Object],/models/claude-4-5-haiku-reasoning/providers\nQwen3 Max,[object Object],/models/qwen3-max/providers\ngpt-oss-120B (high),[object Object],/models/gpt-oss-120b/providers\nDeepSeek V3.1 Terminus,[object Object],/models/deepseek-v3-1-terminus-reasoning/providers\nDeepSeek V3.2,[object Object],/models/deepseek-v3-2/providers\nLlama 4 Maverick,[object Object],/models/llama-4-maverick/providers\nQwen3 Next 80B A3B,[object Object],/models/qwen3-next-80b-a3b-reasoning/providers\nKAT-Coder-Pro V1,[object Object],/models/kat-coder-pro-v1/providers\nMagistral Medium 1.2,[object Object],/models/magistral-medium-2509/providers\nGPT-5 nano (high),[object Object],/models/gpt-5-nano/providers\nNova 2.0 Omni (low),[object Object],/models/nova-2-0-omni-reasoning-low/providers\nNova 2.0 Omni (medium),[object Object],/models/nova-2-0-omni-reasoning-medium/providers\nLlama Nemotron Super 49B v1.5,[object Object],/models/llama-nemotron-super-49b-v1-5-reasoning/providers\nMiniMax-M2,[object Object],/models/minimax-m2/providers\nKimi K2 0905,[object Object],/models/kimi-k2-0905/providers\nGLM-4.6V,[object Object],/models/glm-4-6v-reasoning/providers\nINTELLECT-3,[object Object],/models/intellect-3/providers\nMistral Large 3,[object Object],/models/mistral-large-3/providers\ngpt-oss-120B (low),[object Object],/models/gpt-oss-120b-low/providers\nGLM-4.7,[object Object],/models/glm-4-7-non-reasoning/providers\nRing-1T,[object Object],/models/ring-1t/providers\nDoubao Seed Code,[object Object],/models/doubao-seed-code/providers\nNova Premier,[object Object],/models/nova-premier/providers\nSolar Pro 2,[object Object],/models/solar-pro-2-reasoning/providers\nK-EXAONE,[object Object],/models/k-exaone/providers\nNova 2.0 Pro Preview,[object Object],/models/nova-2-0-pro/providers\nK2-V2 (high),[object Object],/models/k2-v2/providers\nSeed-OSS-36B-Instruct,[object Object],/models/seed-oss-36b-instruct/providers\nDevstral 2,[object Object],/models/devstral-2/providers\nK2-V2 (medium),[object Object],/models/k2-v2-medium/providers\nLlama 3.3 70B,[object Object],/models/llama-3-3-instruct-70b/providers\nGemma 3 27B,[object Object],/models/gemma-3-27b/providers\nMagistral Small 1.2,[object Object],/models/magistral-small-2509/providers\nSolar Pro 2,[object Object],/models/solar-pro-2/providers\ngpt-oss-20B (high),[object Object],/models/gpt-oss-20b/providers\nGLM-4.6V,[object Object],/models/glm-4-6v/providers\nMi:dm K 2.5 Pro,[object Object],/models/mi-dm-k-2-5-pro-dec28/providers\nLlama 4 Scout,[object Object],/models/llama-4-scout/providers\nKimi K2 Thinking,[object Object],/models/kimi-k2-thinking/providers\nEXAONE 4.0 32B,[object Object],/models/exaone-4-0-32b-reasoning/providers\ngpt-oss-20B (low),[object Object],/models/gpt-oss-20b-low/providers\nNova 2.0 Omni,[object Object],/models/nova-2-0-omni/providers\nMistral Small 3.2,[object Object],/models/mistral-small-3-2/providers\nQwen3 30B A3B 2507,[object Object],/models/qwen3-30b-a3b-2507/providers\nOlmo 3.1 32B Think,[object Object],/models/olmo-3-1-32b-think/providers\nK2-V2 (low),[object Object],/models/k2-v2-low/providers\nK-EXAONE,[object Object],/models/k-exaone-non-reasoning/providers\nGemma 3 12B,[object Object],/models/gemma-3-12b/providers\nApriel-v1.5-15B-Thinker,[object Object],/models/apriel-v1-5-15b-thinker/providers\nDevstral Small 2,[object Object],/models/devstral-small-2/providers\nMinistral 3 14B,[object Object],/models/ministral-3-14b/providers\nMinistral 3 8B,[object Object],/models/ministral-3-8b/providers\nMiMo-V2-Flash,[object Object],/models/mimo-v2-flash/providers\nNVIDIA Nemotron 3 Nano,[object Object],/models/nvidia-nemotron-3-nano-30b-a3b/providers\nMinistral 3 3B,[object Object],/models/ministral-3-3b/providers\nGemma 3 4B,[object Object],/models/gemma-3-4b/providers\nGemma 3 1B,[object Object],/models/gemma-3-1b/providers\nLlama 3.1 8B,[object Object],/models/llama-3-1-instruct-8b/providers\nLFM2 2.6B,[object Object],/models/lfm2-2-6b/providers"}
Multilingual Index: Average Across All Languages
An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.
Multilingual Index: Average vs. Output Speed
There is a trade-off between model quality and output speed, with higher intelligence models typically having lower output speed.
An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.
Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Multilingual Index: Average vs. Price
While higher intelligence models are typically more expensive, they do not all follow the same price-quality curve.
An index assessing multilingual performance in general reasoning across multiple languages. Results are computed across English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Portuguese, Indonesian, Japanese, Swahili, German, Korean, Italian, Yoruba, Burmese. See Multilingual Intelligence Index methodology for further details.
Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Multilingual Global-MMLU-Lite: Average
A multilingual version of Massive Multitask Language Understanding, evaluated across multiple languages. Tests general knowledge and reasoning ability in areas like science, humanities, mathematics and more. See methodology for further details.
Pricing: Input and Output Prices
Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).
Output Speed
Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).
{"@context":"https://schema.org","@type":"Dataset","name":"Output Speed","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Output Tokens per Second; Higher is better","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}
Latency: Time To First Answer Token
Time to first answer token received, in seconds, after API request sent. For reasoning models, this includes the 'thinking' time of the model before providing an answer. For models which do not support streaming, this represents time to receive the completion.
End-to-End Response Time
Seconds to receive a 500 token response. Key components:
- Input time: Time to receive the first response token
- Thinking time (only for reasoning models): Time reasoning models spend outputting tokens to reason prior to providing an answer. Amount of tokens based on the average reasoning tokens across a diverse set of 60 prompts (methodology details).
- Answer time: Time to generate 500 output tokens, based on output speed
Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).