Follow us on Twitter or LinkedIn to stay up to date with future analysis
Artificial AnalysisArtificial Analysis
For EnterpriseInsights
  • Artificial AnalysisArtificial Analysis
  • Hardware
  • AI Trends
  • Articles
For EnterpriseInsights

Comparison of Models: Intelligence, Performance & Price Analysis

Comparison and analysis of AI models across key performance metrics including quality, price, output speed, latency, context window & others. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Model Comparison Summary

Intelligence:Gemini 3 Pro Preview logo Gemini 3 Pro Preview and GPT-5.1 (high) logo GPT-5.1 (high) are the highest intelligence models, followed by GPT-5 Codex (high) logo GPT-5 Codex (high) & GPT-5 (high) logo GPT-5 (high).Output Speed (tokens/s):Gemini 2.5 Flash-Lite (Sep) logo Gemini 2.5 Flash-Lite (Sep) (475 t/s) and Granite 3.3 8B logo Granite 3.3 8B (423 t/s) are the fastest models, followed by Gemini 2.5 Flash logo Gemini 2.5 Flash & Granite 4.0 H Small logo Granite 4.0 H Small.Latency (seconds):DeepSeek-OCR logo DeepSeek-OCR (0.19s) and  Apriel-v1.5-15B-Thinker logo Apriel-v1.5-15B-Thinker (0.25s) are the lowest latency models, followed by Gemini 2.0 Flash-Lite (Feb) logo Gemini 2.0 Flash-Lite (Feb) & Mistral Small (Feb) logo Mistral Small (Feb).Price ($ per M tokens):Gemma 3n E4B logo Gemma 3n E4B ($0.03) and Ministral 3B logo Ministral 3B ($0.04) are the cheapest models, followed by Gemma 2 9B logo Gemma 2 9B & DeepSeek-OCR logo DeepSeek-OCR.Context Window:Llama 4 Scout logo Llama 4 Scout (10m) and MiniMax-Text-01 logo MiniMax-Text-01 (4m) are the largest context window models, followed by Grok 4 Fast logo Grok 4 Fast & Grok 4.1 Fast logo Grok 4.1 Fast.

Highlights

Intelligence
Artificial Analysis Intelligence Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better

Navigation

IntelligenceIntelligence Index ComparisonsIntelligence Index Token Use & CostContext WindowPricingPerformance SummarySpeedLatencyEnd-to-End Response TimeModel Size (Open Weights Models Only)Artificial Analysis Omniscience
Parallel Queries:
Prompt Length:

Intelligence

Back to Navigation

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v3.0 incorporates 10 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom
+ Add model from specific provider

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Intelligence Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Intelligence Index v3.0 incorporates 10 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Artificial Analysis Coding Index

Represents the average of coding benchmarks in the Artificial Analysis Intelligence Index (LiveCodeBench, SciCode, Terminal-Bench Hard)
+ Add model from specific provider

Represents the average of coding evaluations in the Artificial Analysis Intelligence Index. Currently includes: LiveCodeBench, SciCode, Terminal-Bench Hard. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Coding Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Represents the average of coding benchmarks in the Artificial Analysis Intelligence Index (LiveCodeBench, SciCode, Terminal-Bench Hard)","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Artificial Analysis Agentic Index

Represents the average of agentic capabilities benchmarks in the Artificial Analysis Intelligence Index (Terminal-Bench Hard, 𝜏²-Bench Telecom)
+ Add model from specific provider

Represents the average of agentic capabilities benchmarks in the Artificial Analysis Intelligence Index (Terminal-Bench Hard, 𝜏²-Bench Telecom).

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Agentic Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Represents the average of agentic capabilities benchmarks in the Artificial Analysis Intelligence Index (Terminal-Bench Hard, 𝜏²-Bench Telecom)","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Artificial Analysis Intelligence Index by Open Weights vs Proprietary

Artificial Analysis Intelligence Index v3.0 incorporates 10 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom
+ Add model from specific provider
Proprietary
Open Weights

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Indicates whether the model weights are available. Models are labelled as 'Commercial Use Restricted' if the weights are available but commercial use is limited (typically requires obtaining a paid license).

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Intelligence Index by Open Weights vs Proprietary","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Intelligence Index v3.0 incorporates 10 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Artificial Analysis Intelligence Index by Model Type

Artificial Analysis Intelligence Index v3.0 incorporates 10 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom
+ Add model from specific provider
Reasoning Model
Non-Reasoning Model

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Intelligence Index by Model Type","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Intelligence Index v3.0 incorporates 10 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better
+ Add model from specific provider
Results claimed by AI Lab (not yet independently verified)
Terminal-Bench Hard (Agentic Coding & Terminal Use)
𝜏²-Bench Telecom (Agentic Tool Use)
AA-LCR (Long Context Reasoning)
Humanity's Last Exam (Reasoning & Knowledge)
MMLU-Pro (Reasoning & Knowledge)
GPQA Diamond (Scientific Reasoning)
LiveCodeBench (Coding)
SciCode (Coding)
IFBench (Instruction Following)
AIME 2025 (Competition Math)
CritPt (Physics Reasoning)
MMMU Pro (Visual Reasoning)

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Artificial Analysis Omniscience

Back to Navigation

AA-Omniscience Index

AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.
+ Add model from specific provider

AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.

{"@context":"https://schema.org","@type":"Dataset","name":"AA-Omniscience Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"modelName,omniscienceIndex,detailsUrl,isLabClaimedValue\nGemini 3 Pro Preview,12.867,/models/gemini-3-pro/providers,false\nClaude 4.1 Opus,4.933,/models/claude-4-1-opus-thinking/providers,false\nGPT-5.1 (high),2.2,/models/gpt-5-1/providers,false\nGrok 4,0.95,/models/grok-4/providers,false\nClaude 4.5 Sonnet,-2.083,/models/claude-4-5-sonnet-thinking/providers,false\nClaude 4.5 Haiku,-5.667,/models/claude-4-5-haiku-reasoning/providers,false\nGPT-5 (high),-11.1,/models/gpt-5/providers,false\nGPT-5 (low),-12.933,/models/gpt-5-low/providers,false\nGPT-5 (medium),-13.733,/models/gpt-5-medium/providers,false\nGemini 2.5 Pro,-17.95,/models/gemini-2-5-pro/providers,false\nLlama 3.1 405B,-18.167,/models/llama-3-1-instruct-405b/providers,false\nGPT-5 mini (high),-19.617,/models/gpt-5-mini/providers,false\nKimi K2 Thinking,-23.417,/models/kimi-k2-thinking/providers,false\nDeepSeek V3.1 Terminus,-26.7,/models/deepseek-v3-1-terminus-reasoning/providers,false\nMagistral Medium 1.2,-27.633,/models/magistral-medium-2509/providers,false\nKimi K2 0905,-28.35,/models/kimi-k2-0905/providers,false\nDeepSeek R1 0528,-29.667,/models/deepseek-r1/providers,false\nGrok 4 Fast,-30.483,/models/grok-4-fast-reasoning/providers,false\nGrok 4.1 Fast,-31.433,/models/grok-4-1-fast-reasoning/providers,false\nDeepSeek V3.2 Exp,-31.9,/models/deepseek-v3-2-reasoning/providers,false\nGPT-5.1,-36.583,/models/gpt-5-1-non-reasoning/providers,false\nGPT-5 (minimal),-36.667,/models/gpt-5-minimal/providers,false\nGemini 2.5 Flash (Sep),-37.5,/models/gemini-2-5-flash-preview-09-2025-reasoning/providers,false\nGPT-4.1,-42.133,/models/gpt-4-1/providers,false\nNVIDIA Nemotron Nano 9B V2,-43.217,/models/nvidia-nemotron-nano-9b-v2-reasoning/providers,false\nLlama 4 Maverick,-43.467,/models/llama-4-maverick/providers,false\nGLM-4.6,-43.883,/models/glm-4-6-reasoning/providers,false\nQwen3 235B 2507,-45.383,/models/qwen3-235b-a22b-instruct-2507/providers,false\nLlama Nemotron Super 49B v1.5,-47.467,/models/llama-nemotron-super-49b-v1-5-reasoning/providers,false\nQwen3 235B A22B 2507,-47.7,/models/qwen3-235b-a22b-instruct-2507-reasoning/providers,false\nMiniMax-M2,-49.533,/models/minimax-m2/providers,false\nCommand A,-50.4,/models/command-a/providers,false\ngpt-oss-120B (high),-51.933,/models/gpt-oss-120b/providers,false\nQwen3 Next 80B A3B,-52.783,/models/qwen3-next-80b-a3b-reasoning/providers,false\ngpt-oss-120B (low),-55.933,/models/gpt-oss-120b-low/providers,false\ngpt-oss-20B (low),-60.6,/models/gpt-oss-20b-low/providers,false\nEXAONE 4.0 32B,-61.417,/models/exaone-4-0-32b-reasoning/providers,false\nGranite 4.0 H Small,-62.067,/models/granite-4-0-h-small/providers,false\ngpt-oss-20B (high),-64.9,/models/gpt-oss-20b/providers,false\nQwen3 8B,-66.117,/models/qwen3-8b-instruct-reasoning/providers,false\nQwen3 8B,-75.4,/models/qwen3-8b-instruct/providers,false"}

AA-Omniscience Accuracy

AA-Omniscience Accuracy (higher is better) measures the proportion of correctly answered questions out of all questions, regardless of whether the model chooses to answer
+ Add model from specific provider

AA-Omniscience Accuracy (higher is better) measures the proportion of correctly answered questions out of all questions, regardless of whether the model chooses to answer

{"@context":"https://schema.org","@type":"Dataset","name":"AA-Omniscience Accuracy","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"AA-Omniscience Accuracy (higher is better) measures the proportion of correctly answered questions out of all questions, regardless of whether the model chooses to answer","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"modelName,omniscienceAccuracy,detailsUrl,isLabClaimedValue\nGemini 3 Pro Preview,0.5365,/models/gemini-3-pro/providers,false\nGrok 4,0.39466666666666667,/models/grok-4/providers,false\nGPT-5 (high),0.38616666666666666,/models/gpt-5/providers,false\nGemini 2.5 Pro,0.37483333333333335,/models/gemini-2-5-pro/providers,false\nGPT-5 (medium),0.37383333333333335,/models/gpt-5-medium/providers,false\nGPT-5 (low),0.3635,/models/gpt-5-low/providers,false\nClaude 4.1 Opus,0.35933333333333334,/models/claude-4-1-opus-thinking/providers,false\nGPT-5.1 (high),0.353,/models/gpt-5-1/providers,false\nClaude 4.5 Sonnet,0.309,/models/claude-4-5-sonnet-thinking/providers,false\nDeepSeek R1 0528,0.29283333333333333,/models/deepseek-r1/providers,false\nKimi K2 Thinking,0.29233333333333333,/models/kimi-k2-thinking/providers,false\nGPT-5.1,0.278,/models/gpt-5-1-non-reasoning/providers,false\nGPT-5 (minimal),0.27216666666666667,/models/gpt-5-minimal/providers,false\nDeepSeek V3.1 Terminus,0.27166666666666667,/models/deepseek-v3-1-terminus-reasoning/providers,false\nGemini 2.5 Flash (Sep),0.2698333333333333,/models/gemini-2-5-flash-preview-09-2025-reasoning/providers,false\nDeepSeek V3.2 Exp,0.26966666666666667,/models/deepseek-v3-2-reasoning/providers,false\nGPT-4.1,0.2608333333333333,/models/gpt-4-1/providers,false\nGLM-4.6,0.25483333333333336,/models/glm-4-6-reasoning/providers,false\nKimi K2 0905,0.24033333333333334,/models/kimi-k2-0905/providers,false\nLlama 4 Maverick,0.23433333333333334,/models/llama-4-maverick/providers,false\nGrok 4.1 Fast,0.23433333333333334,/models/grok-4-1-fast-reasoning/providers,false\nGPT-5 mini (high),0.22966666666666666,/models/gpt-5-mini/providers,false\nQwen3 235B A22B 2507,0.22116666666666668,/models/qwen3-235b-a22b-instruct-2507-reasoning/providers,false\nGrok 4 Fast,0.21933333333333332,/models/grok-4-fast-reasoning/providers,false\nLlama 3.1 405B,0.21733333333333332,/models/llama-3-1-instruct-405b/providers,false\nMiniMax-M2,0.20833333333333334,/models/minimax-m2/providers,false\nMagistral Medium 1.2,0.20083333333333334,/models/magistral-medium-2509/providers,false\ngpt-oss-120B (high),0.19983333333333334,/models/gpt-oss-120b/providers,false\nQwen3 Next 80B A3B,0.18166666666666667,/models/qwen3-next-80b-a3b-reasoning/providers,false\ngpt-oss-120B (low),0.1815,/models/gpt-oss-120b-low/providers,false\nQwen3 235B 2507,0.1755,/models/qwen3-235b-a22b-instruct-2507/providers,false\nClaude 4.5 Haiku,0.16183333333333333,/models/claude-4-5-haiku-reasoning/providers,false\nLlama Nemotron Super 49B v1.5,0.16033333333333333,/models/llama-nemotron-super-49b-v1-5-reasoning/providers,false\nCommand A,0.15466666666666667,/models/command-a/providers,false\ngpt-oss-20B (high),0.14666666666666667,/models/gpt-oss-20b/providers,false\ngpt-oss-20B (low),0.13733333333333334,/models/gpt-oss-20b-low/providers,false\nGranite 4.0 H Small,0.1345,/models/granite-4-0-h-small/providers,false\nEXAONE 4.0 32B,0.133,/models/exaone-4-0-32b-reasoning/providers,false\nQwen3 8B,0.12733333333333333,/models/qwen3-8b-instruct-reasoning/providers,false\nNVIDIA Nemotron Nano 9B V2,0.10716666666666666,/models/nvidia-nemotron-nano-9b-v2-reasoning/providers,false\nQwen3 8B,0.10283333333333333,/models/qwen3-8b-instruct/providers,false"}

AA-Omniscience Hallucination Rate

AA-Omniscience Hallucination Rate (lower is better) measures how often the model answers incorrectly when it should have refused, defined as the proportion of wrong answers out of all non-correct attempts
+ Add model from specific provider

AA-Omniscience Hallucination Rate (lower is better) measures how often the model answers incorrectly when it should have refused, defined as the proportion of wrong answers out of all non-correct attempts

{"@context":"https://schema.org","@type":"Dataset","name":"AA-Omniscience Hallucination Rate","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"AA-Omniscience Hallucination Rate (lower is better) measures how often the model answers incorrectly when it should have refused, defined as the proportion of wrong answers out of all non-correct attempts","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"modelName,omniscienceHallucinationRate,detailsUrl,isLabClaimedValue\nQwen3 8B,0.955043655953929,/models/qwen3-8b-instruct/providers,false\ngpt-oss-20B (high),0.932421875,/models/gpt-oss-20b/providers,false\nGLM-4.6,0.9308879445314248,/models/glm-4-6-reasoning/providers,false\nGPT-4.1,0.9228861330326945,/models/gpt-4-1/providers,false\ngpt-oss-120B (low),0.9051109753614335,/models/gpt-oss-120b-low/providers,false\nQwen3 8B,0.9035523300229182,/models/qwen3-8b-instruct-reasoning/providers,false\ngpt-oss-120B (high),0.8987710893563841,/models/gpt-oss-120b/providers,false\nQwen3 235B A22B 2507,0.8964262786218703,/models/qwen3-235b-a22b-instruct-2507-reasoning/providers,false\nGPT-5.1,0.891735918744229,/models/gpt-5-1-non-reasoning/providers,false\nMiniMax-M2,0.8888421052631579,/models/minimax-m2/providers,false\nGemini 2.5 Pro,0.8866968808317782,/models/gemini-2-5-pro/providers,false\nGemini 2.5 Flash (Sep),0.883131705090162,/models/gemini-2-5-flash-preview-09-2025-reasoning/providers,false\nGemini 3 Pro Preview,0.8798993167925206,/models/gemini-3-pro/providers,false\nGPT-5 (minimal),0.8777192580719029,/models/gpt-5-minimal/providers,false\nLlama 4 Maverick,0.8752720940356987,/models/llama-4-maverick/providers,false\nGranite 4.0 H Small,0.8725207009435779,/models/granite-4-0-h-small/providers,false\nQwen3 Next 80B A3B,0.8665987780040734,/models/qwen3-next-80b-a3b-reasoning/providers,false\nEXAONE 4.0 32B,0.8612072279892349,/models/exaone-4-0-32b-reasoning/providers,false\ngpt-oss-20B (low),0.8605100463678517,/models/gpt-oss-20b-low/providers,false\nDeepSeek R1 0528,0.8336082960169692,/models/deepseek-r1/providers,false\nGPT-5 (medium),0.8163428267234496,/models/gpt-5-medium/providers,false\nGPT-5 (high),0.8099375509095845,/models/gpt-5/providers,false\nDeepSeek V3.2 Exp,0.8060246462802373,/models/deepseek-v3-2-reasoning/providers,false\nCommand A,0.7791798107255521,/models/command-a/providers,false\nGPT-5 (low),0.7742864624247185,/models/gpt-5-low/providers,false\nQwen3 235B 2507,0.7626844552253891,/models/qwen3-235b-a22b-instruct-2507/providers,false\nLlama Nemotron Super 49B v1.5,0.7562524811433108,/models/llama-nemotron-super-49b-v1-5-reasoning/providers,false\nKimi K2 Thinking,0.7439943476212906,/models/kimi-k2-thinking/providers,false\nDeepSeek V3.1 Terminus,0.7395881006864988,/models/deepseek-v3-1-terminus-reasoning/providers,false\nGrok 4.1 Fast,0.716586852416195,/models/grok-4-1-fast-reasoning/providers,false\nKimi K2 0905,0.6895568231680561,/models/kimi-k2-0905/providers,false\nGrok 4 Fast,0.6714346712211785,/models/grok-4-fast-reasoning/providers,false\nGrok 4,0.6379405286343612,/models/grok-4/providers,false\nNVIDIA Nemotron Nano 9B V2,0.6027627403397424,/models/nvidia-nemotron-nano-9b-v2-reasoning/providers,false\nMagistral Medium 1.2,0.5970802919708029,/models/magistral-medium-2509/providers,false\nGPT-5 mini (high),0.5527909995672868,/models/gpt-5-mini/providers,false\nGPT-5.1 (high),0.5115919629057187,/models/gpt-5-1/providers,false\nLlama 3.1 405B,0.5110732538330494,/models/llama-3-1-instruct-405b/providers,false\nClaude 4.1 Opus,0.4838709677419355,/models/claude-4-1-opus-thinking/providers,false\nClaude 4.5 Sonnet,0.47732754462132176,/models/claude-4-5-sonnet-thinking/providers,false\nClaude 4.5 Haiku,0.2606880095446411,/models/claude-4-5-haiku-reasoning/providers,false"}

Intelligence Index Comparisons

Back to Navigation

Intelligence vs. Price

Artificial Analysis Intelligence Index; Price: USD per 1M Tokens
+ Add model from specific provider
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
Moonshot AI
NVIDIA
OpenAI
xAI
Z AI

While higher intelligence models are typically more expensive, they do not all follow the same price-quality curve.

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Intelligence vs. Output Speed

Artificial Analysis Intelligence Index; Output Speed: Output Tokens per Second
+ Add model from specific provider
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
Moonshot AI
NVIDIA
OpenAI
ServiceNow
xAI
Z AI

There is a trade-off between model quality and output speed, with higher intelligence models typically having lower output speed.

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Intelligence vs. End-to-End Response Time

Artificial Analysis Intelligence Index; Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better
+ Add model from specific provider
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
Moonshot AI
NVIDIA
OpenAI
ServiceNow
xAI
Z AI

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Seconds to receive a 500 token response. Key components:

  • Input time: Time to receive the first response token
  • Thinking time (only for reasoning models): Time reasoning models spend outputting tokens to reason prior to providing an answer. Amount of tokens based on the average reasoning tokens across a diverse set of 60 prompts (methodology details).
  • Answer time: Time to generate 500 output tokens, based on output speed

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Intelligence Index Token Use & Cost

Back to Navigation

Output Tokens Used to Run Artificial Analysis Intelligence Index

Tokens used to run all evaluations in the Artificial Analysis Intelligence Index
+ Add model from specific provider
Answer Tokens
Reasoning Tokens

The number of tokens required to run all evaluations in the Artificial Analysis Intelligence Index (excluding repeats).

Intelligence vs. Output Tokens Used in Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index; Output Tokens Used in Artificial Analysis Intelligence Index
+ Add model from specific provider
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
Moonshot AI
NVIDIA
OpenAI
ServiceNow
xAI
Z AI

The number of tokens required to run all evaluations in the Artificial Analysis Intelligence Index (excluding repeats).

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Cost to Run Artificial Analysis Intelligence Index

Cost (USD) to run all evaluations in the Artificial Analysis Intelligence Index
+ Add model from specific provider
Input Cost
Output Cost
Reasoning Cost

The cost to run the evaluations in the Artificial Analysis Intelligence Index, calculated using the model's input and output token pricing and the number of tokens used across evaluations (excluding repeats).

Intelligence vs. Cost to Run Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index; Cost to Run Intelligence Index
+ Add model from specific provider
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
Moonshot AI
NVIDIA
OpenAI
xAI
Z AI

The cost to run the evaluations in the Artificial Analysis Intelligence Index, calculated using the model's input and output token pricing and the number of tokens used across evaluations (excluding repeats).

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Context Window

Back to Navigation

Context Window

Context Window: Tokens Limit; Higher is better
+ Add model from specific provider

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

{"@context":"https://schema.org","@type":"Dataset","name":"Context Window","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Context Window: Tokens Limit; Higher is better","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Intelligence vs. Context Window

Artificial Analysis Intelligence Index; Context Window: Tokens Limit
+ Add model from specific provider
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
Moonshot AI
NVIDIA
OpenAI
ServiceNow
xAI
Z AI

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Pricing

Back to Navigation

Pricing: Input and Output Prices

Price: USD per 1M Tokens
+ Add model from specific provider
Input price
Output price

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Pricing: Cached Input Prompts

Price: USD per 1M Tokens
+ Add model from specific provider
Input (standard)
Cache Write
Cache Hit
Cache Storage per Hour
Output (standard)

Full Analysis (Intro to Caching, Full Provider Details and More)

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

One-time cost charged when storing a prompt in the cache for future reuse, represented as USD per million tokens.

Price per token for cached prompts (previously processed), typically offering a significant discount compared to regular input price, represented as USD per million tokens.

Cost to maintain tokens in cache storage, charged per million tokens per hour. Currently only applicable to Google's Gemini models.

Price per token generated by the model (received from the API), represented as USD per million Tokens.

Pricing: Image Input Pricing

+ Add model from specific provider
Image Input Price: USD per 1k images at 1MP (1024x1024)

Price for 1,000 images at a resolution of 1 Megapixel (1024 x 1024) processed by the model.

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Intelligence vs. Price (Log Scale)

Artificial Analysis Intelligence Index; Price: USD per 1M Tokens; Inspired by prior analysis by Swyx
+ Add model from specific provider
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
Moonshot AI
NVIDIA
OpenAI
xAI
Z AI

While higher intelligence models are typically more expensive, they do not all follow the same price-quality curve.

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Speed

Measured by Output Speed (tokens per second)

Back to Navigation

Output Speed

Output Tokens per Second; Higher is better
+ Add model from specific provider

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

{"@context":"https://schema.org","@type":"Dataset","name":"Output Speed","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Output Tokens per Second; Higher is better","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Output Speed by Input Token Count (Context Length)

Output Tokens per Second; Higher is better
+ Add model from specific provider
1k input tokens

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Length of tokens provided in the request. See Prompt Options above to see benchmarks of different input prompt lengths across other charts.

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Output Speed Variance

Output Tokens per Second; Results by percentile; Higher is better
+ Add model from specific provider
Median; other points represent Min, 25th, 75th percentiles and Max respectively

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Picture of the author
Over time charts are only available with parallel_queries=1

Output Speed vs. Price

Output Speed: Output Tokens per Second; Price: USD per 1M Tokens
+ Add model from specific provider
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
Moonshot AI
NVIDIA
OpenAI
xAI
Z AI

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Latency vs. Output Speed

Latency: Seconds to First Token Received; Output Speed: Output Tokens per Second
+ Add model from specific provider
Most attractive quadrant
Alibaba
Anthropic
DeepSeek
Google
LG AI Research
Meta
MiniMax
Mistral
Moonshot AI
NVIDIA
OpenAI
ServiceNow
xAI
Z AI

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Time to first token received, in seconds, after API request sent. For reasoning models which share reasoning tokens, this will be the first reasoning token. For models which do not support streaming, this represents time to receive the completion.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Latency

Measured by Time (seconds) to First Token

Back to Navigation

Latency: Time To First Answer Token

Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time
+ Add model from specific provider
Input processing
Thinking (reasoning models, when applicable)

Time to first answer token received, in seconds, after API request sent. For reasoning models, this includes the 'thinking' time of the model before providing an answer. For models which do not support streaming, this represents time to receive the completion.

Latency: Time To First Token

Seconds to First Token Received; Lower is better
+ Add model from specific provider

Time to first token received, in seconds, after API request sent. For reasoning models which share reasoning tokens, this will be the first reasoning token. For models which do not support streaming, this represents time to receive the completion.

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

{"@context":"https://schema.org","@type":"Dataset","name":"Latency: Time To First Token","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Seconds to First Token Received; Lower is better","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Time to First Token by Input Token Count (Context Length)

Seconds to First Token Received; Lower is better
+ Add model from specific provider
1k input tokens

Length of tokens provided in the request. See Prompt Options above to see benchmarks of different input prompt lengths across other charts.

Time to first token received, in seconds, after API request sent. For reasoning models which share reasoning tokens, this will be the first reasoning token. For models which do not support streaming, this represents time to receive the completion.

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Time to First Token Variance

Seconds to First Token Received; Results by percentile; Lower is better
+ Add model from specific provider
Median; other points represent Min, 25th, 75th percentiles and Max respectively

Time to first token received, in seconds, after API request sent. For reasoning models which share reasoning tokens, this will be the first reasoning token. For models which do not support streaming, this represents time to receive the completion.

Picture of the author
Over time charts are only available with parallel_queries=1

End-to-End Response Time

Seconds to output 500 Tokens, calculated based on time to first token, 'thinking' time for reasoning models, and output speed

Back to Navigation

End-to-End Response Time

Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better
+ Add model from specific provider
'Thinking' time (reasoning models)
Input processing time
Outputting time

Seconds to receive a 500 token response. Key components:

  • Input time: Time to receive the first response token
  • Thinking time (only for reasoning models): Time reasoning models spend outputting tokens to reason prior to providing an answer. Amount of tokens based on the average reasoning tokens across a diverse set of 60 prompts (methodology details).
  • Answer time: Time to generate 500 output tokens, based on output speed

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

End-to-End Response Time by Input Token Count (Context Length)

Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better
+ Add model from specific provider
1k input tokens

Length of tokens provided in the request. See Prompt Options above to see benchmarks of different input prompt lengths across other charts.

Seconds to receive a 500 token response considering input processing time, 'thinking' time of reasoning models, and output speed.

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Over time charts are only available with parallel_queries=1

Model Size (Open Weights Models Only)

Back to Navigation

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
+ Add model from specific provider
Active Parameters
Passive Parameters

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index
+ Add model from specific provider
Most attractive quadrant
Alibaba
DeepSeek
LG AI Research
Meta
MiniMax
Moonshot AI
NVIDIA
OpenAI
ServiceNow
Z AI

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)
+ Add model from specific provider
Most attractive quadrant
Alibaba
DeepSeek
LG AI Research
Meta
MiniMax
Moonshot AI
NVIDIA
OpenAI
ServiceNow
Z AI

Combination metric covering multiple dimensions of intelligence - the simplest way to compare how smart models are. Version 3.0 was released in September 2025 and includes: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard, 𝜏²-Bench Telecom. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Further details
Model NameCreatorLicenseContext WindowFurther analysis
OpenAI logoOpenAI
OpenAI logogpt-oss-20B (low)
OpenAI
Open
131k
OpenAI logogpt-oss-120B (high)
OpenAI
Open
131k
OpenAI logogpt-oss-20B (high)
OpenAI
Open
131k
OpenAI logogpt-oss-120B (low)
OpenAI
Open
131k
OpenAI logoGPT-5.1 (Non-reasoning)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 (low)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 mini (high)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 mini (minimal)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 (high)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 (minimal)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 (medium)
OpenAI
Proprietary
400k
OpenAI logoo3
OpenAI
Proprietary
200k
OpenAI logoGPT-5 nano (high)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 (ChatGPT)
OpenAI
Proprietary
128k
OpenAI logoGPT-5 Codex (high)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 nano (minimal)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 nano (medium)
OpenAI
Proprietary
400k
OpenAI logoGPT-5 mini (medium)
OpenAI
Proprietary
400k
OpenAI logoGPT-5.1 (high)
OpenAI
Proprietary
400k
OpenAI logoo1
OpenAI
Proprietary
200k
OpenAI logoo1-preview
OpenAI
Proprietary
128k
OpenAI logoo1-mini
OpenAI
Proprietary
128k
OpenAI logoGPT-4o (Aug '24)
OpenAI
Proprietary
128k
OpenAI logoGPT-4o (May '24)
OpenAI
Proprietary
128k
OpenAI logoGPT-4 Turbo
OpenAI
Proprietary
128k
OpenAI logoGPT-4o (Nov '24)
OpenAI
Proprietary
128k
OpenAI logoGPT-4o mini
OpenAI
Proprietary
128k
OpenAI logoGPT-3.5 Turbo
OpenAI
Proprietary
4k
OpenAI logoGPT-3.5 Turbo (0613)
OpenAI
Proprietary
4k
OpenAI logoGPT-4.1
OpenAI
Proprietary
1m
OpenAI logoo3-mini (high)
OpenAI
Proprietary
200k
OpenAI logoGPT-4.1 nano
OpenAI
Proprietary
1m
OpenAI logoGPT-4.1 mini
OpenAI
Proprietary
1m
OpenAI logoGPT-4o mini Realtime (Dec '24)
OpenAI
Proprietary
128k
OpenAI logoo4-mini (high)
OpenAI
Proprietary
200k
OpenAI logoGPT-4o Realtime (Dec '24)
OpenAI
Proprietary
128k
OpenAI logoGPT-4.5 (Preview)
OpenAI
Proprietary
128k
OpenAI logoGPT-4o (ChatGPT)
OpenAI
Proprietary
128k
OpenAI logoGPT-4o (March 2025, chatgpt-4o-latest)
OpenAI
Proprietary
128k
OpenAI logoo1-pro
OpenAI
Proprietary
200k
OpenAI logoo3-mini
OpenAI
Proprietary
200k
OpenAI logoGPT-4
OpenAI
Proprietary
8k
OpenAI logoo3-pro
OpenAI
Proprietary
200k
xAI logoxAI
xAI logoGrok-1
xAI
Open
8k
xAI logoGrok 4 Fast (Non-reasoning)
xAI
Proprietary
2m
xAI logoGrok Code Fast 1
xAI
Proprietary
256k
xAI logoGrok 4.1 Fast (Reasoning)
xAI
Proprietary
2m
xAI logoGrok 4 Fast (Reasoning)
xAI
Proprietary
2m
xAI logoGrok 4
xAI
Proprietary
256k
xAI logoGrok 3 mini Reasoning (high)
xAI
Proprietary
1m
xAI logoGrok 4.1 Fast (Non-reasoning)
xAI
Proprietary
2m
xAI logoGrok Beta
xAI
Proprietary
128k
xAI logoGrok 3
xAI
Proprietary
1m
xAI logoGrok 3 Reasoning Beta
xAI
Proprietary
1m
xAI logoGrok 2 (Dec '24)
xAI
Open
131k
Meta logoMeta
Meta logoLlama 3.3 Instruct 70B
Meta
Open
128k
Meta logoLlama 3.1 Instruct 405B
Meta
Open
128k
Meta logoLlama 3.2 Instruct 90B (Vision)
Meta
Open
128k
Meta logoLlama 3.2 Instruct 11B (Vision)
Meta
Open
128k
Meta logoLlama 4 Scout
Meta
Open
10m
Meta logoLlama 4 Maverick
Meta
Open
1m
Meta logoLlama 65B
Meta
Open
2k
Meta logoLlama 3.1 Instruct 70B
Meta
Open
128k
Meta logoLlama 3.1 Instruct 8B
Meta
Open
128k
Meta logoLlama 3.2 Instruct 3B
Meta
Open
128k
Meta logoLlama 3 Instruct 70B
Meta
Open
8k
Meta logoLlama 3 Instruct 8B
Meta
Open
8k
Meta logoLlama 3.2 Instruct 1B
Meta
Open
128k
Meta logoLlama 2 Chat 70B
Meta
Open
4k
Meta logoLlama 2 Chat 13B
Meta
Open
4k
Meta logoLlama 2 Chat 7B
Meta
Open
4k
Google logoGoogle
Google logoGemini 3 Pro Preview
Google
Proprietary
1m
Google logoGemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)
Google
Proprietary
1m
Google logoGemma 3 1B Instruct
Google
Open
32k
Google logoGemma 3n E4B Instruct
Google
Open
32k
Google logoGemini 2.5 Flash Preview (Sep '25) (Reasoning)
Google
Proprietary
1m
Google logoGemma 3 27B Instruct
Google
Open
128k
Google logoGemma 3 270M
Google
Open
32k
Google logoGemini 2.5 Pro
Google
Proprietary
1m
Google logoGemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)
Google
Proprietary
1m
Google logoGemma 3n E2B Instruct
Google
Open
32k
Google logoGemma 3 12B Instruct
Google
Open
128k
Google logoGemma 3 4B Instruct
Google
Open
128k
Google logoGemini 2.5 Flash Preview (Sep '25) (Non-reasoning)
Google
Proprietary
1m
Google logoGemini 2.0 Pro Experimental (Feb '25)
Google
Proprietary
2m
Google logoGemini 2.0 Flash (experimental)
Google
Proprietary
1m
Google logoGemini 1.5 Pro (Sep '24)
Google
Proprietary
2m
Google logoGemini 2.0 Flash-Lite (Preview)
Google
Proprietary
1m
Google logoGemini 2.0 Flash (Feb '25)
Google
Proprietary
1m
Google logoGemini 1.5 Flash (Sep '24)
Google
Proprietary
1m
Google logoGemma 2 27B
Google
Open
8k
Google logoGemma 2 9B
Google
Open
8k
Google logoGemini 1.5 Flash-8B
Google
Proprietary
1m
Google logoGemini 2.0 Flash Thinking Experimental (Jan '25)
Google
Proprietary
1m
Google logoGemini 2.5 Flash-Lite (Reasoning)
Google
Proprietary
1m
Google logoGemini 1.0 Pro
Google
Proprietary
33k
Google logoGemini 1.5 Pro (May '24)
Google
Proprietary
2m
Google logoGemini 1.0 Ultra
Google
Proprietary
33k
Google logoGemini 2.5 Flash Preview (Non-reasoning)
Google
Proprietary
1m
Google logoGemini 1.5 Flash (May '24)
Google
Proprietary
1m
Google logoGemini 2.5 Pro Preview (May' 25)
Google
Proprietary
1m
Google logoGemini 2.5 Flash (Non-reasoning)
Google
Proprietary
1m
Google logoGemini 2.5 Flash Preview (Reasoning)
Google
Proprietary
1m
Google logoGemini 2.0 Flash-Lite (Feb '25)
Google
Proprietary
1m
Google logoGemini 2.0 Flash Thinking Experimental (Dec '24)
Google
Proprietary
2m
Google logoGemini 2.5 Flash (Reasoning)
Google
Proprietary
1m
Google logoGemini 2.5 Flash-Lite (Non-reasoning)
Google
Proprietary
1m
Google logoGemma 3n E4B Instruct Preview (May '25)
Google
Open
32k
Google logoGemini 2.5 Pro Preview (Mar' 25)
Google
Proprietary
1m
Google logoPALM-2
Google
Proprietary
8k
Anthropic logoAnthropic
Anthropic logoClaude 4.5 Sonnet (Reasoning)
Anthropic
Proprietary
1m
Anthropic logoClaude 4.1 Opus (Reasoning)
Anthropic
Proprietary
200k
Anthropic logoClaude 4.5 Sonnet (Non-reasoning)
Anthropic
Proprietary
1m
Anthropic logoClaude 4.1 Opus (Non-reasoning)
Anthropic
Proprietary
200k
Anthropic logoClaude 4.5 Haiku (Reasoning)
Anthropic
Proprietary
200k
Anthropic logoClaude 4.5 Haiku (Non-reasoning)
Anthropic
Proprietary
200k
Anthropic logoClaude 3.5 Sonnet (Oct '24)
Anthropic
Proprietary
200k
Anthropic logoClaude 3.5 Sonnet (June '24)
Anthropic
Proprietary
200k
Anthropic logoClaude 3 Opus
Anthropic
Proprietary
200k
Anthropic logoClaude 3.5 Haiku
Anthropic
Proprietary
200k
Anthropic logoClaude 3 Sonnet
Anthropic
Proprietary
200k
Anthropic logoClaude 3 Haiku
Anthropic
Proprietary
200k
Anthropic logoClaude Instant
Anthropic
Proprietary
100k
Anthropic logoClaude 4 Opus (Non-reasoning)
Anthropic
Proprietary
200k
Anthropic logoClaude 4 Sonnet (Reasoning)
Anthropic
Proprietary
1m
Anthropic logoClaude 3.7 Sonnet (Non-reasoning)
Anthropic
Proprietary
200k
Anthropic logoClaude 2.1
Anthropic
Proprietary
200k
Anthropic logoClaude 3.7 Sonnet (Reasoning)
Anthropic
Proprietary
200k
Anthropic logoClaude 4 Opus (Reasoning)
Anthropic
Proprietary
200k
Anthropic logoClaude 4 Sonnet (Non-reasoning)
Anthropic
Proprietary
1m
Anthropic logoClaude 2.0
Anthropic
Proprietary
100k
Mistral logoMistral
Mistral logoMinistral 8B
Mistral
Open
128k
Mistral logoMinistral 3B
Mistral
Proprietary
128k
Mistral logoMistral Medium 3.1
Mistral
Proprietary
128k
Mistral logoDevstral Small (Jul '25)
Mistral
Open
256k
Mistral logoCodestral (Jan '25)
Mistral
Proprietary
256k
Mistral logoMagistral Medium 1.2
Mistral
Proprietary
128k
Mistral logoMagistral Small 1.2
Mistral
Open
128k
Mistral logoDevstral Medium
Mistral
Proprietary
256k
Mistral logoMistral Small 3.2
Mistral
Open
128k
Mistral logoMistral Large 2 (Nov '24)
Mistral
Open
128k
Mistral logoMistral Large 2 (Jul '24)
Mistral
Open
128k
Mistral logoPixtral Large
Mistral
Open
128k
Mistral logoMistral Small 3
Mistral
Open
32k
Mistral logoMistral Small (Sep '24)
Mistral
Open
33k
Mistral logoMixtral 8x22B Instruct
Mistral
Open
65k
Mistral logoMistral Small (Feb '24)
Mistral
Proprietary
33k
Mistral logoMistral Large (Feb '24)
Mistral
Proprietary
33k
Mistral logoPixtral 12B (2409)
Mistral
Open
128k
Mistral logoMistral NeMo
Mistral
Open
128k
Mistral logoMixtral 8x7B Instruct
Mistral
Open
33k
Mistral logoCodestral-Mamba
Mistral
Open
256k
Mistral logoMistral 7B Instruct
Mistral
Open
8k
Mistral logoDevstral Small (May '25)
Mistral
Open
256k
Mistral logoMistral Small 3.1
Mistral
Open
128k
Mistral logoCodestral (May '24)
Mistral
Open
33k
Mistral logoMistral Saba
Mistral
Proprietary
32k
Mistral logoMistral Medium
Mistral
Proprietary
33k
Mistral logoMagistral Small 1
Mistral
Open
40k
Mistral logoMagistral Medium 1
Mistral
Proprietary
40k
Mistral logoMistral Medium 3
Mistral
Proprietary
128k
DeepSeek logoDeepSeek
DeepSeek logoDeepSeek R1 Distill Llama 70B
DeepSeek
Open
128k
DeepSeek logoDeepSeek V3.2 Exp (Reasoning)
DeepSeek
Open
128k
DeepSeek logoDeepSeek R1 0528 Qwen3 8B
DeepSeek
Open
33k
DeepSeek logoDeepSeek V3.2 Exp (Non-reasoning)
DeepSeek
Open
128k
DeepSeek logoDeepSeek R1 0528 (May '25)
DeepSeek
Open
128k
DeepSeek logoDeepSeek V3.1 Terminus (Non-reasoning)
DeepSeek
Open
128k
DeepSeek logoDeepSeek V3.1 Terminus (Reasoning)
DeepSeek
Open
128k
DeepSeek logoDeepSeek-OCR
DeepSeek
Open
8k
DeepSeek logoDeepSeek R1 Distill Qwen 32B
DeepSeek
Open
128k
DeepSeek logoDeepSeek V3 (Dec '24)
DeepSeek
Open
128k
DeepSeek logoDeepSeek R1 Distill Qwen 14B
DeepSeek
Open
128k
DeepSeek logoDeepSeek-V2.5 (Dec '24)
DeepSeek
Open
128k
DeepSeek logoDeepSeek-Coder-V2
DeepSeek
Open
128k
DeepSeek logoDeepSeek R1 Distill Llama 8B
DeepSeek
Open
128k
DeepSeek logoDeepSeek LLM 67B Chat (V1)
DeepSeek
Open
4k
DeepSeek logoDeepSeek R1 Distill Qwen 1.5B
DeepSeek
Open
128k
DeepSeek logoDeepSeek V3.1 (Non-reasoning)
DeepSeek
Open
128k
DeepSeek logoDeepSeek R1 (Jan '25)
DeepSeek
Open
128k
DeepSeek logoDeepSeek V3.1 (Reasoning)
DeepSeek
Open
128k
DeepSeek logoDeepSeek V3 0324
DeepSeek
Open
128k
DeepSeek logoDeepSeek Coder V2 Lite Instruct
DeepSeek
Open
128k
DeepSeek logoDeepSeek-V2.5
DeepSeek
Open
128k
DeepSeek logoDeepSeek-V2-Chat
DeepSeek
Open
128k
Perplexity logoPerplexity
Perplexity logoR1 1776
Perplexity
Open
128k
Perplexity logoSonar Pro
Perplexity
Proprietary
200k
Perplexity logoSonar
Perplexity
Proprietary
127k
Perplexity logoSonar Reasoning Pro
Perplexity
Proprietary
127k
Perplexity logoSonar Reasoning
Perplexity
Proprietary
127k
Amazon logoAmazon
Amazon logoNova Pro
Amazon
Proprietary
300k
Amazon logoNova Lite
Amazon
Proprietary
300k
Amazon logoNova Micro
Amazon
Proprietary
130k
Amazon logoNova Premier
Amazon
Proprietary
1m
Microsoft Azure logoMicrosoft Azure
Microsoft Azure logoPhi-4
Microsoft Azure
Open
16k
Microsoft Azure logoPhi-4 Mini Instruct
Microsoft Azure
Open
128k
Microsoft Azure logoPhi-4 Multimodal Instruct
Microsoft Azure
Open
128k
Microsoft Azure logoPhi-3 Medium Instruct 14B
Microsoft Azure
Open
128k
Microsoft Azure logoPhi-3 Mini Instruct 3.8B
Microsoft Azure
Open
4k
Liquid AI logoLiquid AI
Liquid AI logoLFM2 1.2B
Liquid AI
Open
33k
Liquid AI logoLFM2 2.6B
Liquid AI
Open
33k
Liquid AI logoLFM2 8B A1B
Liquid AI
Open
33k
Liquid AI logoLFM 40B
Liquid AI
Proprietary
32k
Upstage logoUpstage
Upstage logoSolar Pro 2 (Reasoning)
Upstage
Proprietary
66k
Upstage logoSolar Pro 2 (Non-reasoning)
Upstage
Proprietary
66k
Upstage logoSolar Mini
Upstage
Open
4k
Upstage logoSolar Pro 2 (Preview) (Non-reasoning)
Upstage
Proprietary
64k
Upstage logoSolar Pro 2 (Preview) (Reasoning)
Upstage
Proprietary
64k
MiniMax logoMiniMax
MiniMax logoMiniMax-Text-01
MiniMax
Open
4m
MiniMax logoMiniMax-M2
MiniMax
Open
205k
MiniMax logoMiniMax M1 40k
MiniMax
Open
1m
MiniMax logoMiniMax M1 80k
MiniMax
Open
1m
NVIDIA logoNVIDIA
NVIDIA logoLlama 3.1 Nemotron Instruct 70B
NVIDIA
Open
128k
NVIDIA logoLlama Nemotron Super 49B v1.5 (Non-reasoning)
NVIDIA
Open
128k
NVIDIA logoLlama 3.3 Nemotron Super 49B v1 (Reasoning)
NVIDIA
Open
128k
NVIDIA logoLlama 3.3 Nemotron Super 49B v1 (Non-reasoning)
NVIDIA
Open
128k
NVIDIA logoLlama 3.1 Nemotron Ultra 253B v1 (Reasoning)
NVIDIA
Open
128k
NVIDIA logoNVIDIA Nemotron Nano 9B V2 (Reasoning)
NVIDIA
Open
131k
NVIDIA logoNVIDIA Nemotron Nano 9B V2 (Non-reasoning)
NVIDIA
Open
131k
NVIDIA logoLlama 3.1 Nemotron Nano 4B v1.1 (Reasoning)
NVIDIA
Open
128k
NVIDIA logoLlama Nemotron Super 49B v1.5 (Reasoning)
NVIDIA
Open
128k
Moonshot AI logoMoonshot AI
Moonshot AI logoKimi K2 Thinking
Moonshot AI
Open
256k
Moonshot AI logoKimi K2 0905
Moonshot AI
Open
256k
Moonshot AI logoKimi Linear 48B A3B Instruct
Moonshot AI
Open
1m
Moonshot AI logoKimi K2
Moonshot AI
Open
128k
Allen Institute for AI logoAllen Institute for AI
Allen Institute for AI logoOLMo 2 32B
Allen Institute for AI
Open
4k
Allen Institute for AI logoOLMo 2 7B
Allen Institute for AI
Open
4k
Allen Institute for AI logoMolmo 7B-D
Allen Institute for AI
Open
4k
Allen Institute for AI logoLlama 3.1 Tulu3 405B
Allen Institute for AI
Open
128k
IBM logoIBM
IBM logoGranite 4.0 H 350M
IBM
Open
33k
IBM logoGranite 4.0 350M
IBM
Open
33k
IBM logoGranite 4.0 H 1B
IBM
Open
128k
IBM logoGranite 4.0 1B
IBM
Open
128k
IBM logoGranite 4.0 H Small
IBM
Open
128k
IBM logoGranite 4.0 Micro
IBM
Open
128k
IBM logoGranite 3.3 8B (Non-reasoning)
IBM
Open
128k
Reka AI logoReka AI
Reka AI logoReka Flash 3
Reka AI
Open
128k
Reka AI logoReka Flash (Sep '24)
Reka AI
Proprietary
128k
Reka AI logoReka Core
Reka AI
Proprietary
128k
Reka AI logoReka Flash (Feb '24)
Reka AI
Proprietary
128k
Reka AI logoReka Edge
Reka AI
Proprietary
128k
Nous Research logoNous Research
Nous Research logoDeepHermes 3 - Mistral 24B Preview (Non-reasoning)
Nous Research
Open
32k
Nous Research logoHermes 4 - Llama-3.1 70B (Reasoning)
Nous Research
Open
128k
Nous Research logoDeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)
Nous Research
Open
128k
Nous Research logoHermes 4 - Llama-3.1 405B (Non-reasoning)
Nous Research
Open
128k
Nous Research logoHermes 4 - Llama-3.1 70B (Non-reasoning)
Nous Research
Open
128k
Nous Research logoHermes 4 - Llama-3.1 405B (Reasoning)
Nous Research
Open
128k
Nous Research logoHermes 3 - Llama-3.1 70B
Nous Research
Open
128k
LG AI Research logoLG AI Research
LG AI Research logoEXAONE 4.0 32B (Non-reasoning)
LG AI Research
Open
131k
LG AI Research logoExaone 4.0 1.2B (Non-reasoning)
LG AI Research
Open
64k
LG AI Research logoEXAONE 4.0 32B (Reasoning)
LG AI Research
Open
131k
LG AI Research logoExaone 4.0 1.2B (Reasoning)
LG AI Research
Open
64k
Baidu logoBaidu
Baidu logoERNIE 4.5 300B A47B
Baidu
Open
131k
Deep Cogito logoDeep Cogito
Deep Cogito logoCogito v2.1 (Reasoning)
Deep Cogito
Open
128k
Z AI logoZ AI
Z AI logoGLM-4.5-Air
Z AI
Open
128k
Z AI logoGLM-4.6 (Reasoning)
Z AI
Open
200k
Z AI logoGLM-4.5V (Reasoning)
Z AI
Open
64k
Z AI logoGLM-4.6 (Non-reasoning)
Z AI
Open
200k
Z AI logoGLM-4.5V (Non-reasoning)
Z AI
Open
64k
Z AI logoGLM-4.5 (Reasoning)
Z AI
Open
128k
Cohere logoCohere
Cohere logoAya Expanse 32B
Cohere
Open
128k
Cohere logoAya Expanse 8B
Cohere
Open
8k
Cohere logoCommand A
Cohere
Open
256k
Cohere logoCommand-R+ (Aug '24)
Cohere
Open
128k
Cohere logoCommand-R+ (Apr '24)
Cohere
Open
128k
Cohere logoCommand-R (Aug '24)
Cohere
Open
128k
Cohere logoCommand-R (Mar '24)
Cohere
Open
128k
ServiceNow logoServiceNow
ServiceNow logoApriel-v1.5-15B-Thinker
ServiceNow
Open
128k
AI21 Labs logoAI21 Labs
AI21 Labs logoJamba 1.7 Large
AI21 Labs
Open
256k
AI21 Labs logoJamba Reasoning 3B
AI21 Labs
Open
262k
AI21 Labs logoJamba 1.7 Mini
AI21 Labs
Open
258k
AI21 Labs logoJamba Instruct
AI21 Labs
Proprietary
256k
AI21 Labs logoJamba 1.6 Mini
AI21 Labs
Open
256k
AI21 Labs logoJamba 1.6 Large
AI21 Labs
Open
256k
AI21 Labs logoJamba 1.5 Mini
AI21 Labs
Open
256k
AI21 Labs logoJamba 1.5 Large
AI21 Labs
Open
256k
Alibaba logoAlibaba
Alibaba logoQwen3 Coder 480B A35B Instruct
Alibaba
Open
262k
Alibaba logoQwen3 4B 2507 Instruct
Alibaba
Open
262k
Alibaba logoQwen3 4B 2507 (Reasoning)
Alibaba
Open
262k
Alibaba logoQwen3 235B A22B 2507 Instruct
Alibaba
Open
256k
Alibaba logoQwen3 Coder 30B A3B Instruct
Alibaba
Open
262k
Alibaba logoQwen3 VL 32B (Reasoning)
Alibaba
Open
256k
Alibaba logoQwen3 VL 32B Instruct
Alibaba
Open
256k
Alibaba logoQwen3 235B A22B 2507 (Reasoning)
Alibaba
Open
256k
Alibaba logoQwen3 Next 80B A3B Instruct
Alibaba
Open
262k
Alibaba logoQwen3 Next 80B A3B (Reasoning)
Alibaba
Open
262k
Alibaba logoQwen3 30B A3B 2507 (Reasoning)
Alibaba
Open
262k
Alibaba logoQwen3 30B A3B 2507 Instruct
Alibaba
Open
262k
Alibaba logoQwen3 Omni 30B A3B (Reasoning)
Alibaba
Open
66k
Alibaba logoQwen3 VL 235B A22B Instruct
Alibaba
Open
262k
Alibaba logoQwen3 Omni 30B A3B Instruct
Alibaba
Open
66k
Alibaba logoQwen3 VL 30B A3B Instruct
Alibaba
Open
256k
Alibaba logoQwen3 VL 30B A3B (Reasoning)
Alibaba
Open
256k
Alibaba logoQwen3 Max
Alibaba
Proprietary
262k
Alibaba logoQwen3 VL 235B A22B (Reasoning)
Alibaba
Open
262k
Alibaba logoQwen3 Max Thinking
Alibaba
Proprietary
262k
Alibaba logoQwen3 VL 4B Instruct
Alibaba
Open
256k
Alibaba logoQwen3 VL 8B Instruct
Alibaba
Open
256k
Alibaba logoQwen3 VL 8B (Reasoning)
Alibaba
Open
256k
Alibaba logoQwen3 VL 4B (Reasoning)
Alibaba
Open
256k
Alibaba logoQwen Chat 14B
Alibaba
Open
8k
Alibaba logoQwen2.5 Max
Alibaba
Proprietary
32k
Alibaba logoQwen2.5 Instruct 72B
Alibaba
Open
131k
Alibaba logoQwen2.5 Coder Instruct 32B
Alibaba
Open
131k
Alibaba logoQwen2.5 Turbo
Alibaba
Proprietary
1m
Alibaba logoQwen2 Instruct 72B
Alibaba
Open
131k
Alibaba logoQwen3 32B (Non-reasoning)
Alibaba
Open
33k
Alibaba logoQwen3 4B (Non-reasoning)
Alibaba
Open
32k
Alibaba logoQwen2.5 Instruct 32B
Alibaba
Open
128k
Alibaba logoQwen3 30B A3B (Reasoning)
Alibaba
Open
33k
Alibaba logoQwen3 235B A22B (Reasoning)
Alibaba
Open
33k
Alibaba logoQwen3 32B (Reasoning)
Alibaba
Open
33k
Alibaba logoQwen3 14B (Non-reasoning)
Alibaba
Open
33k
Alibaba logoQwen3 1.7B (Non-reasoning)
Alibaba
Open
32k
Alibaba logoQwen3 8B (Non-reasoning)
Alibaba
Open
33k
Alibaba logoQwen3 8B (Reasoning)
Alibaba
Open
131k
Alibaba logoQwQ 32B
Alibaba
Open
131k
Alibaba logoQwen3 235B A22B (Non-reasoning)
Alibaba
Open
33k
Alibaba logoQwQ 32B-Preview
Alibaba
Open
33k
Alibaba logoQwen3 4B (Reasoning)
Alibaba
Open
32k
Alibaba logoQwen3 0.6B (Non-reasoning)
Alibaba
Open
32k
Alibaba logoQwen3 30B A3B (Non-reasoning)
Alibaba
Open
33k
Alibaba logoQwen2.5 Coder Instruct 7B
Alibaba
Open
131k
Alibaba logoQwen3 14B (Reasoning)
Alibaba
Open
33k
Alibaba logoQwen Chat 72B
Alibaba
Open
34k
Alibaba logoQwen3 1.7B (Reasoning)
Alibaba
Open
32k
Alibaba logoQwen3 Max (Preview)
Alibaba
Proprietary
262k
Alibaba logoQwen3 0.6B (Reasoning)
Alibaba
Open
32k
Alibaba logoQwen1.5 Chat 110B
Alibaba
Open
32k
InclusionAI logoInclusionAI
InclusionAI logoRing-flash-2.0
InclusionAI
Open
128k
InclusionAI logoRing-1T
InclusionAI
Open
128k
InclusionAI logoLing-mini-2.0
InclusionAI
Open
131k
InclusionAI logoLing-flash-2.0
InclusionAI
Open
128k
InclusionAI logoLing-1T
InclusionAI
Open
128k
ByteDance Seed logoByteDance Seed
ByteDance Seed logoSeed-OSS-36B-Instruct
ByteDance Seed
Open
512k
ByteDance Seed logoDoubao Seed Code
ByteDance Seed
Proprietary
256k
OpenChat logoOpenChat
OpenChat logoOpenChat 3.5 (1210)
OpenChat
Open
8k
Databricks logoDatabricks
Databricks logoDBRX Instruct
Databricks
Open
33k
Snowflake logoSnowflake
Snowflake logoArctic Instruct
Snowflake
Open
4k
01.AI logo01.AI
01.AI logoYi-Large
01.AI
Proprietary
32k

Models compared: OpenAI: GPT 4o Audio, GPT 4o Realtime, GPT 4o Speech Pipeline, GPT Realtime, GPT Realtime Mini (Oct '25), GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (0301), GPT-3.5 Turbo (0613), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Turbo (1106), GPT-4 Vision, GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, GPT-4.5 (Preview), GPT-4o (Apr), GPT-4o (Aug), GPT-4o (ChatGPT), GPT-4o (Mar), GPT-4o (May), GPT-4o (Nov), GPT-4o Realtime (Dec), GPT-4o mini, GPT-4o mini Realtime (Dec), GPT-5 (ChatGPT), GPT-5 (high), GPT-5 (low), GPT-5 (medium), GPT-5 (minimal), GPT-5 Codex (high), GPT-5 Pro (high), GPT-5 mini (high), GPT-5 mini (medium), GPT-5 mini (minimal), GPT-5 nano (high), GPT-5 nano (medium), GPT-5 nano (minimal), GPT-5.1, GPT-5.1 (high), GPT-5.1 Codex (high), GPT-5.1 Codex mini (high), gpt-oss-120B (high), gpt-oss-120B (low), gpt-oss-20B (high), gpt-oss-20B (low), o1, o1-mini, o1-preview, o1-pro, o3, o3-mini, o3-mini (high), o3-pro, and o4-mini (high), Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 11B (Vision), Llama 3.2 1B, Llama 3.2 3B, Llama 3.2 90B (Vision), Llama 3.3 70B, Llama 4 Behemoth, Llama 4 Maverick, Llama 4 Scout, and Llama 65B, Google: Gemini 1.0 Pro, Gemini 1.0 Ultra, Gemini 1.5 Flash (May), Gemini 1.5 Flash (Sep), Gemini 1.5 Flash-8B, Gemini 1.5 Pro (May), Gemini 1.5 Pro (Sep), Gemini 2.0 Flash, Gemini 2.0 Flash (exp), Gemini 2.0 Flash Thinking exp. (Dec), Gemini 2.0 Flash Thinking exp. (Jan), Gemini 2.0 Flash-Lite (Feb), Gemini 2.0 Flash-Lite (Preview), Gemini 2.0 Pro Experimental, Gemini 2.5 Flash, Gemini 2.5 Flash Live Preview, Gemini 2.5 Flash Native Audio, Gemini 2.5 Flash Native Audio Dialog, Gemini 2.5 Flash (Sep), Gemini 2.5 Flash-Lite, Gemini 2.5 Flash-Lite (Sep), Gemini 2.5 Pro, Gemini 2.5 Pro (Mar), Gemini 2.5 Pro (May), Gemini 3 Pro Preview, Gemini 3 Pro Preview (low), Gemini Experimental (Nov), Gemma 2 27B, Gemma 2 2B, Gemma 2 9B, Gemma 3 12B, Gemma 3 1B, Gemma 3 270M, Gemma 3 27B, Gemma 3 4B, Gemma 3n E2B, Gemma 3n E4B, Gemma 3n E4B (May), Gemma 7B, PALM-2, and Whisperwind, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Haiku, Claude 3.5 Sonnet (June), Claude 3.5 Sonnet (Oct), Claude 3.7 Sonnet, Claude 4 Opus, Claude 4 Sonnet, Claude 4.1 Opus, Claude 4.5 Haiku, Claude 4.5 Sonnet, and Claude Instant, Mistral: Codestral (Jan), Codestral (May), Codestral-Mamba, Devstral Medium, Devstral Small, Devstral Small (May), Magistral Medium 1, Magistral Medium 1.1, Magistral Medium 1.2, Magistral Small 1, Magistral Small 1.1, Magistral Small 1.2, Ministral 3B, Ministral 8B, Mistral 7B, Mistral Large (Feb), Mistral Large 2 (Jul), Mistral Large 2 (Nov), Mistral Medium, Mistral Medium 3, Mistral Medium 3.1, Mistral NeMo, Mistral Saba, Mistral Small (Feb), Mistral Small (Sep), Mistral Small 3, Mistral Small 3.1, Mistral Small 3.2, Mixtral 8x22B, Mixtral 8x7B, Pixtral 12B, and Pixtral Large, DeepSeek: DeepSeek Coder V2 Lite, DeepSeek LLM 67B (V1), DeepSeek Prover V2 671B, DeepSeek R1 (FP4), DeepSeek R1 (Jan), DeepSeek R1 0528, DeepSeek R1 0528 Qwen3 8B, DeepSeek R1 Distill Llama 70B, DeepSeek R1 Distill Llama 8B, DeepSeek R1 Distill Qwen 1.5B, DeepSeek R1 Distill Qwen 14B, DeepSeek R1 Distill Qwen 32B, DeepSeek V3 (Dec), DeepSeek V3 0324, DeepSeek V3.1, DeepSeek V3.1 Terminus, DeepSeek V3.2 Exp, DeepSeek-Coder-V2, DeepSeek-OCR, DeepSeek-V2, DeepSeek-V2.5, DeepSeek-V2.5 (Dec), DeepSeek-VL2, and Janus Pro 7B, Perplexity: PPLX-70B Online, PPLX-7B-Online, R1 1776, Sonar, Sonar 3.1 Huge, Sonar 3.1 Large, Sonar 3.1 Small , Sonar Large, Sonar Pro, Sonar Reasoning, Sonar Reasoning Pro, and Sonar Small, xAI: Grok 2, Grok 3, Grok 3 Reasoning Beta, Grok 3 mini, Grok 3 mini Reasoning (low), Grok 3 mini Reasoning (high), Grok 4, Grok 4 Fast, Grok 4 Fast 1111 (Reasoning), Grok 4 mini (0908), Grok 4.1 Fast, Grok 4.1 Fast v4, Grok Beta, Grok Code Fast 1, Grok-1, and test model, OpenChat: OpenChat 3.5, Amazon: Nova 2.0 Lite, Nova 2.0 Lite (high), Nova 2.0 Lite (low), Nova 2.0 Lite (medium), Nova Lite, Nova Micro, Nova Premier, and Nova Pro, Microsoft Azure: Phi-3 Medium 14B, Phi-3 Mini, Phi-4, Phi-4 Mini, Phi-4 Multimodal, Phi-4 mini reasoning, Phi-4 reasoning, Phi-4 reasoning plus, and Yosemite, Liquid AI: LFM 1.3B, LFM 3B, LFM 40B, LFM2 1.2B, LFM2 2.6B, and LFM2 8B A1B, Upstage: Solar Mini, Solar Pro, Solar Pro (Nov), Solar Pro 2, and Solar Pro 2 , Databricks: DBRX, MiniMax: MiniMax M1 40k, MiniMax M1 80k, MiniMax-M2, and MiniMax-Text-01, NVIDIA: Cosmos Nemotron 34B, Llama 3.1 Nemotron 70B, Llama 3.1 Nemotron Nano 4B v1.1, Llama 3.1 Nemotron Nano 8B, Llama 3.3 Nemotron Nano 8B, Llama Nemotron Ultra, Llama 3.3 Nemotron Super 49B, Llama Nemotron Super 49B v1.5, NVIDIA Nemotron Nano 12B v2 VL, and NVIDIA Nemotron Nano 9B V2, StepFun: Step-2, Step-2-Mini, Step3, step-1-128k, step-1-256k, step-1-32k, step-1-8k, step-1-flash, step-2-16k-exp, and step-r1-v-mini, IBM: Granite 3.0 2B, Granite 3.0 8B, Granite 3.3 8B, Granite 4.0 1B, Granite 4.0 350M, Granite 4.0 8B, Granite 4.0 H 1B, Granite 4.0 H 350M, Granite 4.0 H Small, Granite 4.0 Micro, Granite 4.0 Tiny, and Granite Vision 3.3 2B, Inceptionlabs: Mercury, Mercury Coder Mini, Mercury Coder Small, and Mercury Instruct, Reka AI: Reka Core, Reka Edge, Reka Flash (Feb), Reka Flash, Reka Flash 3, and Reka Flash 3.1, LG AI Research: EXAONE 4.0 32B, EXAONE Deep 32B, and Exaone 4.0 1.2B, Xiaomi: MiMo 7B RL, Baidu: ERNIE 4.5, ERNIE 4.5 0.3B, ERNIE 4.5 21B A3B, ERNIE 4.5 300B A47B, ERNIE 4.5 VL 28B A3B, ERNIE 4.5 VL 424B A47B, ERNIE 5.0 Thinking Preview, and ERNIE X1, Baichuan: Baichuan 4 and Baichuan M1 (Preview), vercel: v0-1.0-md, Apple: Apple On-Device and FastVLM, Other: LLaVA-v1.5-7B, Tencent: Hunyuan A13B, Hunyuan 80B A13B, Hunyuan T1, and Hunyuan-TurboS, Z AI: GLM-4 32B, GLM-4 9B, GLM-4-Air, GLM-4 AirX, GLM-4 FlashX, GLM-4-Long, GLM-4-Plus, GLM-4.1V 9B Thinking, GLM-4.5, GLM-4.5-Air, GLM-4.5V, GLM-4.6, GLM-Z1 32B, GLM-Z1 9B, GLM-Z1 Rumination 32B, and GLM-Zero (Preview), Cohere: Aya Expanse 32B, Aya Expanse 8B, Command, Command A, Command Light, Command R7B, Command-R, Command-R (Mar), Command-R+ (Apr), and Command-R+, Bytedance: Duobao 1.5 Pro, Seed-Thinking-v1.5, Skylark Lite, and Skylark Pro, AI21 Labs: Jamba 1.5 Large, Jamba 1.5 Large (Feb), Jamba 1.5 Mini, Jamba 1.5 Mini (Feb), Jamba 1.6 Large, Jamba 1.6 Mini, Jamba 1.7 Large, Jamba 1.7 Mini, Jamba Instruct, and Jamba Reasoning 3B, Snowflake: Arctic and Snowflake Llama 3.3 70B, PaddlePaddle: PaddleOCR-VL-0.9B, Alibaba: QwQ-32B, QwQ 32B-Preview, Qwen Chat 14B, Qwen Chat 72B, Qwen Chat 7B, Qwen1.5 Chat 110B, Qwen1.5 Chat 14B, Qwen1.5 Chat 32B, Qwen1.5 Chat 72B, Qwen1.5 Chat 7B, Qwen2 72B, Qwen2 Instruct 7B, Qwen2 Instruct A14B 57B, Qwen2-VL 72B, Qwen2.5 Coder 32B, Qwen2.5 Coder 7B , Qwen2.5 Instruct 14B, Qwen2.5 Instruct 32B, Qwen2.5 72B, Qwen2.5 Instruct 7B, Qwen2.5 Max, Qwen2.5 Max 01-29, Qwen2.5 Omni 7B, Qwen2.5 Plus, Qwen2.5 Turbo, Qwen2.5 VL 72B, Qwen2.5 VL 7B, Qwen3 0.6B, Qwen3 1.7B, Qwen3 14B, Qwen3 235B, Qwen3 235B A22B 2507, Qwen3 235B 2507, Qwen3 30B, Qwen3 30B A3B 2507, Qwen3 32B, Qwen3 4B, Qwen3 4B 2507, Qwen3 8B, Qwen3 Coder 30B A3B, Qwen3 Coder 480B, Qwen3 Max, Qwen3 Max (Preview), Qwen3 Max Thinking, Qwen3 Next 80B A3B, Qwen3 Omni 30B A3B, Qwen3 VL 235B A22B, Qwen3 VL 30B A3B, Qwen3 VL 32B, Qwen3 VL 4B, and Qwen3 VL 8B, InclusionAI: Ling-1T, Ling-flash-2.0, Ling-mini-2.0, Ring-1T, and Ring-flash-2.0, 01.AI: Yi-Large and Yi-Lightning, and ByteDance Seed: Doubao Seed Code and Seed-OSS-36B-Instruct.

Footer

Key Links

  • Compare Language Models
  • Language Models Leaderboard
  • Language Model API Leaderboard
  • Image Arena
  • Video Arena
  • Speech Arena

Artificial Analysis

  • FAQ
  • Contact & Data access
  • Terms of Use
  • Privacy Policy
  • hello@artificialanalysis.ai

Subscribe to our newsletter

TwitterLinkedIn