Menu

logo
Artificial Analysis
HOME
logo

Hyperbolic: Models Intelligence, Performance & Price

Analysis of Hyperbolic's models across key metrics including quality, price, output speed, latency, context window & more. This analysis is intended to support you in choosing the best model provided by Hyperbolic for your use-case. For more details including relating to our methodology, see our FAQs. Models analyzed: Llama 3.3 70B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 3B, Pixtral 12B, DeepSeek R1, DeepSeek V3 (FP8), Qwen2.5 72B, Qwen2.5 Coder 32B, QwQ-32B, Llama 3 70B, and QwQ 32B-Preview.
Link:

Hyperbolic Model Comparison Summary

Intelligence:DeepSeek R1 logo DeepSeek R1 and DeepSeek V3 (FP8) logo DeepSeek V3 (FP8) are the highest quality models offered by Hyperbolic, followed by QwQ 32B-Preview logo QwQ 32B-Preview, Llama 3.3 70B logo Llama 3.3 70B & Llama 3.1 405B logo Llama 3.1 405B.Output Speed (tokens/s):Llama 3.2 3B logo Llama 3.2 3B (212 t/s) and Llama 3.1 70B logo Llama 3.1 70B (88 t/s) are the fastest models offered by Hyperbolic, followed by DeepSeek R1 logo DeepSeek R1, Pixtral 12B logo Pixtral 12B & Llama 3.1 8B logo Llama 3.1 8B.Latency (seconds):Pixtral 12B logo Pixtral 12B (0.34s) and  QwQ-32B logo QwQ-32B (0.37s) are the lowest latency models offered by Hyperbolic, followed by Llama 3.1 405B logo Llama 3.1 405B, QwQ 32B-Preview logo QwQ 32B-Preview & Llama 3.2 3B logo Llama 3.2 3B.Blended Price ($/M tokens):Llama 3.1 8B logo Llama 3.1 8B ($0.10) and Llama 3.2 3B logo Llama 3.2 3B ($0.10) are the cheapest models offered by Hyperbolic, followed by Pixtral 12B logo Pixtral 12B, Qwen2.5 Coder 32B logo Qwen2.5 Coder 32B & QwQ-32B logo QwQ-32B.Context Window Size:Qwen2.5 72B logo Qwen2.5 72B (131k) and Qwen2.5 Coder 32B logo Qwen2.5 Coder 32B (131k) are the largest context window models offered by Hyperbolic, followed by QwQ-32B logo QwQ-32B, Llama 3.3 70B logo Llama 3.3 70B & Llama 3.1 405B logo Llama 3.1 405B.

Highlights

Intelligence
Artificial Analysis Intelligence Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Parallel Queries:
Prompt Length:
Features
Model Intelligence
Price
Output tokens/s
Latency
Further
Analysis
Hyperbolic logo
DeepSeek logo
DeepSeek R1
128k
60
$2.00
82.7
2.29
Hyperbolic (FP8) logo
DeepSeek logo
DeepSeek V3 (FP8)
128k
46
$0.25
39.2
1.11
Hyperbolic logo
Alibaba logo
QwQ 32B-Preview
33k
43
$0.20
66.7
0.92
Hyperbolic logo
Meta logo
Llama 3.3 70B
128k
41
$0.40
42.4
1.29
Hyperbolic logo
Meta logo
Llama 3.1 405B
128k
40
$4.00
6.8
0.75
Hyperbolic logo
Alibaba logo
Qwen2.5 72B
131k
40
$0.40
19.3
1.87
Hyperbolic logo
Alibaba logo
Qwen2.5 Coder 32B
131k
36
$0.20
52.4
1.00
Hyperbolic logo
Meta logo
Llama 3.1 70B
128k
35
$0.40
88.5
1.16
Hyperbolic logo
Meta logo
Llama 3 70B
8k
27
$0.40
21.2
1.56
Hyperbolic logo
Meta logo
Llama 3.1 8B
128k
24
$0.10
67.9
0.96
Hyperbolic logo
Mistral logo
Pixtral 12B
128k
23
$0.10
78.1
0.34
Hyperbolic logo
Meta logo
Llama 3.2 3B
128k
20
$0.10
212.2
0.95
Hyperbolic logo
Alibaba logo
QwQ-32B
131k
$0.20
34.8
0.37

Key definitions

Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 72 hours of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.