Comparison of Models: Quality, Performance & Price Analysis
Comparison and analysis of AI models across key metrics including quality, price, performance and speed (throughput tokens per second & latency), context window & others. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
Models compared: OpenAI: GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), and GPT-4 Vision, Google: Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemma 7B, Meta: Code Llama (70B), Llama 2 Chat (13B), Llama 2 Chat (70B), Llama 2 Chat (7B), Llama 3 (70B), and Llama 3 (8B), Mistral: Mistral 7B, Mistral Large, Mistral Medium, Mistral Small, Mixtral 8x22B, and Mixtral 8x7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, and Claude Instant, Cohere: Command, Command Light, Command-R, and Command-R+, Perplexity: PPLX-70B Online and PPLX-7B-Online, xAI: Grok-1, OpenChat: OpenChat 3.5, Microsoft Azure: Phi-3-mini, and Databricks: DBRX.
Model Comparison Summary
Highlights
Quality & Context window
Quality comparison by ability
Quality vs. Context window, Input token price
Context window
Pricing
Quality vs. Price
Pricing: Input and Output prices
Performance summary
Quality vs. Throughput, Price
Throughput vs. Price
Latency vs. Throughput
Latency vs. Throughput: Provider & Model combinations
Quality vs. Throughput: Provider & Model combinations
Speed
Measured by Throughput (tokens per second)
Throughput
Throughput Variance
Throughput, Over Time
Latency
Measured by Time (seconds) to First Token
Latency
Latency Variance
Latency, Over Time
Total Response Time
Time to receive 100 tokens output, calculated by latency and throughput metrics