
Groq: Models Intelligence, Performance & Price
Analysis of Groq's models across key metrics including quality, price, output speed, latency, context window & more. This analysis is intended to support you in choosing the best model provided by Groq for your use-case. For more details including relating to our methodology, see our FAQs. Models analyzed: Llama 3.3 70B (Spec decoding), Llama 3.3 70B, Llama 3.2 90B (Vision), Llama 3.2 11B (Vision), Llama 3.1 8B, Llama 3.2 3B, Llama 3.2 1B, Gemma 2 9B, Mistral Saba, DeepSeek R1 Distill Llama 70B, DeepSeek R1 Distill Llama 70B (Spec decoding), DeepSeek R1 Distill Qwen 32B, Qwen2.5 Coder 32B, Qwen2.5 Instruct 32B, QwQ-32B, Llama 3 70B, and Llama 3 8B.
Link:
Groq Model Comparison Summary
Intelligence:
QwQ-32B and
DeepSeek R1 Distill Qwen 32B are the highest quality models offered by Groq, followed by
DeepSeek R1 Distill Llama 70B,
DeepSeek R1 Distill Llama 70B (Spec decoding) &
Llama 3.3 70B (Spec decoding).Output Speed (tokens/s):
Llama 3.2 1B (3409 t/s) and
Llama 3.3 70B (Spec decoding) (1605 t/s) are the fastest models offered by Groq, followed by
Llama 3.2 3B,
DeepSeek R1 Distill Llama 70B (Spec decoding) &
Llama 3 8B.Latency (seconds):
QwQ-32B (0.10s) and
Llama 3.2 11B (Vision) (0.18s) are the lowest latency models offered by Groq, followed by
Llama 3.1 8B,
Gemma 2 9B &
Qwen2.5 Instruct 32B.Blended Price ($/M tokens):
Llama 3.2 1B ($0.04) and
Llama 3.1 8B ($0.06) are the cheapest models offered by Groq, followed by
Llama 3 8B,
Llama 3.2 3B &
Llama 3.2 11B (Vision).Context Window Size:
Qwen2.5 Coder 32B (131k) and
QwQ-32B (131k) are the largest context window models offered by Groq, followed by
Llama 3.3 70B,
Llama 3.1 8B &
DeepSeek R1 Distill Llama 70B.





Highlights
Intelligence
Artificial Analysis Intelligence Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Parallel Queries:
Prompt Length:
Features | Model Intelligence | Price | Output tokens/s | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
QwQ-32B | 131k | 58 | $0.32 | 398.8 | 0.10 | ||
![]() DeepSeek R1 Distill Qwen 32B | 128k | 52 | $0.69 | 137.8 | 0.39 | ||
![]() DeepSeek R1 Distill Llama 70B | 128k | 48 | $0.81 | 275.4 | 0.34 | ||
![]() DeepSeek R1 Distill Llama 70B (Spec decoding) | 128k | 48 | $0.81 | 1,361.7 | 0.41 | ||
Llama 3.3 70B (Spec decoding) | 8k | 41 | $0.69 | 1,604.9 | 0.42 | ||
Llama 3.3 70B | 128k | 41 | $0.64 | 275.5 | 0.36 | ||
Qwen2.5 Instruct 32B | 128k | 37 | $0.79 | 198.0 | 0.22 | ||
Qwen2.5 Coder 32B | 131k | 36 | $0.79 | 197.4 | 0.36 | ||
Llama 3.2 90B (Vision) | 8k | 33 | $0.90 | 263.9 | 0.32 | ||
Llama 3 70B | 8k | 27 | $0.64 | 348.2 | 0.29 | ||
Llama 3.1 8B | 128k | 24 | $0.06 | 751.1 | 0.20 | ||
Gemma 2 9B | 8k | 22 | $0.20 | 651.5 | 0.21 | ||
Llama 3 8B | 8k | 21 | $0.06 | 1,200.8 | 0.30 | ||
Llama 3.2 3B | 8k | 20 | $0.06 | 1,537.5 | 0.34 | ||
Llama 3.2 1B | 8k | 10 | $0.04 | 3,408.7 | 0.46 | ||
Llama 3.2 11B (Vision) | 8k | $0.18 | 750.2 | 0.18 | |||
![]() Mistral Saba | 32k | $0.79 | 384.8 | 0.34 |
Key definitions
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 72 hours of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.