
Novita: Models Quality, Performance & Price
Analysis of Novita's models across key metrics including quality, price, output speed, latency, context window & more. This analysis is intended to support you in choosing the best model provided by Novita for your use-case. For more details including relating to our methodology, see our FAQs. Models analyzed: Llama 3.3 70B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 3B, DeepSeek R1, DeepSeek R1 Distill Llama 70B, DeepSeek V3, Llama 3 70B, Llama 3 8B, and Mistral 7B.
Link:
Novita Model Comparison Summary
Quality:
DeepSeek V3Ā andĀ
Llama 3.3 70BĀ are the highest quality models offered by Novita, followed by
Llama 3.1 70B,
Llama 3 70B &
Llama 3 8B.Output Speed (tokens/s):
Llama 3.2 3B (123 t/s)Ā andĀ
Mistral 7B (96 t/s)Ā are the fastest models offered by Novita, followed by
Llama 3.1 8B,
Llama 3 8B &
Llama 3.3 70B.Latency (seconds):
Llama 3.1 8B (0.55s)Ā and Ā
Llama 3 8B (0.57s)Ā are the lowest latency models offered by Novita, followed by
Llama 3.2 3B,
Llama 3.3 70B &
DeepSeek V3.Blended Price ($/M tokens):
Llama 3.2 3B ($0.04)Ā andĀ
Llama 3 8B ($0.04)Ā are the cheapest models offered by Novita, followed by
Llama 3.1 8B,
Mistral 7B &
Llama 3.1 70B.Context Window Size:
Llama 3.3 70B (128k)Ā andĀ
DeepSeek R1 (64k)Ā are the largest context window models offered by Novita, followed by
DeepSeek V3,
DeepSeek R1 Distill Llama 70B &
Llama 3.1 70B.







Highlights
Quality
Artificial Analysis Quality Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Parallel Queries:
Prompt Length:
Features | Model Quality | Price | Output tokens/s | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
![]() | Llama 3.3 70B | 128k | 74 | $0.39 | 63.4 | 0.83 | |
![]() | Llama 3.1 70B | 32k | 68 | $0.35 | 63.0 | 1.03 | |
![]() | Llama 3.1 8B | 16k | 53 | $0.05 | 73.9 | 0.55 | |
![]() | Llama 3.2 3B | 32k | 48 | $0.04 | 123.2 | 0.60 | |
![]() | ![]() DeepSeek R1 | 64k | 89 | $4.00 | 16.3 | 85.59 | |
![]() | ![]() DeepSeek R1 Distill Llama 70B | 32k | 85 | $0.39 | 15.5 | 30.62 | |
![]() | ![]() DeepSeek V3 | 64k | 79 | $0.89 | 9.2 | 0.88 | |
![]() | Llama 3 70B | 8k | 62 | $0.57 | 23.5 | 1.35 | |
![]() | Llama 3 8B | 8k | 53 | $0.04 | 71.3 | 0.57 | |
![]() | ![]() Mistral 7B | 32k | 33 | $0.06 | 95.5 | 0.99 |
Key definitions
Artificial Analysis Quality Index: Average result across our evaluations covering different dimensions of model intelligence. Currently includes MMLU, GPQA, Math & HumanEval. OpenAI o1 model figures are preliminary and are based on figures stated by OpenAI. See methodology for more details.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.