Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

LLM Leaderboard - Comparison of over 100 AI models from OpenAI, Google, DeepSeek & others

Comparison and ranking the performance of over 100 AI models (LLMs) across key metrics including intelligence, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.

For comparison of API Providers hosting the models see

HIGHLIGHTS

Intelligence:Claude Opus 4.6 (Adaptive) logo Claude Opus 4.6 (Adaptive) and GPT-5.2 (xhigh) logo GPT-5.2 (xhigh) are the highest intelligence models, followed by Claude Opus 4.5 logo Claude Opus 4.5 & GLM-5 logo GLM-5.Output Speed (tokens/s):Gemini 2.5 Flash-Lite (Sep) logo Gemini 2.5 Flash-Lite (Sep) (506 t/s) and Granite 3.3 8B logo Granite 3.3 8B (495 t/s) are the fastest models, followed by Gemini 2.5 Flash-Lite (Sep) logo Gemini 2.5 Flash-Lite (Sep) & Nova Micro logo Nova Micro.Latency (seconds):Apriel-v1.5-15B-Thinker logo Apriel-v1.5-15B-Thinker (0.17s) and  DeepSeek-OCR logo DeepSeek-OCR (0.19s) are the lowest latency models, followed by Apriel-v1.6-15B-Thinker logo Apriel-v1.6-15B-Thinker & NVIDIA Nemotron Nano 12B v2 VL logo NVIDIA Nemotron Nano 12B v2 VL.Price ($ per M tokens):Gemma 3n E4B logo Gemma 3n E4B ($0.03) and DeepSeek-OCR logo DeepSeek-OCR ($0.05) are the cheapest models, followed by Llama 3.2 1B logo Llama 3.2 1B & Nova Micro logo Nova Micro.Context Window:Llama 4 Scout logo Llama 4 Scout (10m) and Grok 4.1 Fast logo Grok 4.1 Fast (2m) are the largest context window models, followed by Grok 4.1 Fast logo Grok 4.1 Fast & Gemini 2.0 Pro Experimental logo Gemini 2.0 Pro Experimental.
Parallel Queries:
Prompt Length:

Key definitions

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Time to first token received, in seconds, after API request sent. For reasoning models which share reasoning tokens, this will be the first reasoning token. For models which do not support streaming, this represents time to receive the completion.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Price per token generated by the model (received from the API), represented as USD per million Tokens.

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Metrics are 'live' and are based on the past 72 hours of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.