Hyperbolic: Models Quality, Performance & Price
Analysis of Hyperbolic's models across key metrics including quality, price, output speed, latency, context window & more. This analysis is intended to support you in choosing the best model provided by Hyperbolic for your use-case. For more details including relating to our methodology, see our FAQs. Models analyzed: Llama 3.3 70B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 3B, Pixtral 12B, DeepSeek R1, DeepSeek V3 (FP8), DeepSeek-V2.5, Qwen2.5 72B, Qwen2.5 Coder 32B, QwQ 32B-Preview, and Llama 3 70B.
Link:
Hyperbolic Model Comparison Summary
Quality:
Qwen2.5 72B and
Llama 3.1 405B are the highest quality models offered by Hyperbolic, followed by
Llama 3.3 70B,
Qwen2.5 Coder 32B &
Llama 3.1 70B.Output Speed (tokens/s):
Llama 3.2 3B (201 t/s) and
Llama 3.1 8B (115 t/s) are the fastest models offered by Hyperbolic, followed by
Pixtral 12B,
Qwen2.5 Coder 32B &
QwQ 32B-Preview.Latency (seconds):
Llama 3.1 8B (0.41s) and
Llama 3.2 3B (0.43s) are the lowest latency models offered by Hyperbolic, followed by
Pixtral 12B,
Qwen2.5 Coder 32B &
QwQ 32B-Preview.Blended Price ($/M tokens):
Pixtral 12B ($0.10) and
Llama 3.1 8B ($0.10) are the cheapest models offered by Hyperbolic, followed by
Llama 3.2 3B,
Qwen2.5 Coder 32B &
QwQ 32B-Preview.Context Window Size:
Qwen2.5 72B (131k) and
Qwen2.5 Coder 32B (131k) are the largest context window models offered by Hyperbolic, followed by
DeepSeek R1,
DeepSeek V3 (FP8) &
Llama 3.3 70B.





Highlights
Quality
Artificial Analysis Quality Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Parallel Queries:
Prompt Length:
Features | Model Quality | Price | Output tokens/s | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
Llama 3.3 70B | 128k | 74 | $0.40 | 24.4 | 0.54 | ||
Llama 3.1 405B | 128k | 75 | $4.00 | 7.7 | 0.83 | ||
Llama 3.1 70B | 128k | 69 | $0.40 | 25.0 | 0.65 | ||
Llama 3.1 8B | 128k | 53 | $0.10 | 115.2 | 0.41 | ||
Llama 3.2 3B | 128k | 48 | $0.10 | 200.5 | 0.43 | ||
![]() Pixtral 12B | 128k | 57 | $0.10 | 76.7 | 0.45 | ||
![]() DeepSeek R1 | 128k | 89 | $2.00 | 13.7 | 104.27 | ||
![]() DeepSeek V3 (FP8) | 128k | 79 | $0.25 | 5.5 | 1.56 | ||
![]() DeepSeek-V2.5 | 128k | $2.00 | 7.5 | 0.78 | |||
Qwen2.5 72B | 131k | 77 | $0.40 | 27.7 | 0.58 | ||
Qwen2.5 Coder 32B | 131k | 72 | $0.20 | 36.6 | 0.48 | ||
QwQ 32B-Preview | 33k | $0.20 | 35.4 | 0.49 | |||
Llama 3 70B | 8k | 62 | $0.40 | 32.8 | 0.60 |
Key definitions
Artificial Analysis Quality Index: Average result across our evaluations covering different dimensions of model intelligence. Currently includes MMLU, GPQA, Math & HumanEval. OpenAI o1 model figures are preliminary and are based on figures stated by OpenAI. See methodology for more details.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.