DBRX Instruct: Quality, Performance & Price Analysis
Analysis of Databricks's DBRX Instruct and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more. Click on any model to compare API providers for that model. For more details including relating to our methodology, see our FAQs.
Comparison Summary
Quality:
DBRX is of higher quality compared to average, with a MMLU score of 0.737 and a Quality Index across evaluations of 76.
Price:DBRX is cheaper compared to average with a price of $1.40 per 1M Tokens (blended 3:1).
DBRX Input token price: $1.40, Output token price: $1.40 per 1M Tokens.
Speed:DBRX Input token price: $1.40, Output token price: $1.40 per 1M Tokens.
DBRX is faster compared to average, with a throughput of 77.9 tokens per second.
Latency:DBRX has a lower latency compared to average, taking 0.47s to receive the first token (TTFT).
Context Window:DBRX has a smaller context windows than average, with a context window of 33k tokens.
Highlights
Quality
Quality Index; Higher is better
Speed
Throughput in Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Parallel Queries: (Beta)
Prompt Length:
Quality & Context window
Quality comparison by ability
Varied metrics by ability categorization; Higher is better
General Ability (Chatbot Arena)
Reasoning & Knowledge (MMLU)
Reasoning & Knowledge (MT Bench)
Coding (HumanEval)
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Throughput (tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.
Quality vs. Context window, Input token price
Quality: General reasoning index, Context window: Tokens limit, Input Price: USD per 1M Tokens
Most attractive quadrant
Size represents Input Price (USD per M Tokens)
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Context window
Context window: Tokens limit; Higher is better
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Pricing
Quality vs. Price
Quality: General reasoning index, Price: USD per 1M Tokens
Most attractive quadrant
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.
Pricing: Input and Output prices
USD per 1M Tokens
Input price
Output price
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Median across providers: Figures represent median (P50) across all providers which support the model.
Pricing comparison of DBRX Instruct API providers
Performance summary
Quality vs. Throughput, Price
Quality: General reasoning index, Throughput: Tokens per Second, Price: USD per 1M Tokens
Most attractive quadrant
Size represents Price (USD per M Tokens)
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Throughput vs. Price
Throughput: Tokens per Second, Price: USD per 1M Tokens
Most attractive quadrant
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Latency vs. Throughput
Latency: Seconds to First Tokens Chunk Received, Throughput: Tokens per Second
Most attractive quadrant
Size represents Price (USD per M Tokens)
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Latency: Time to first token of tokens received, in seconds, after API request sent.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.
Latency vs. Throughput: Provider & Model combinations
Latency: Seconds to First Tokens Chunk Received, Throughput: Tokens per Second
Most attractive quadrant
Size represents Price (USD per M Tokens)
DBRX Instruct (Lepton AI)
DBRX Instruct (Fireworks)
DBRX Instruct (Databricks)
DBRX Instruct (Together.ai)
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Latency: Time to first token of tokens received, in seconds, after API request sent.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.
Quality vs. Throughput: Provider & Model combinations
Quality: General reasoning index, Throughput: Tokens per Second, Price: USD per 1M Tokens
Most attractive quadrant
Size represents Price (USD per M Tokens)
DBRX Instruct (Lepton AI)
DBRX Instruct (Fireworks)
DBRX Instruct (Databricks)
DBRX Instruct (Together.ai)
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Latency: Time to first token of tokens received, in seconds, after API request sent.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.
Speed
Measured by Throughput (tokens per second)
Throughput
Output Tokens per Second; Higher is better
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Median across providers: Figures represent median (P50) across all providers which support the model.
Throughput Variance
Output Tokens per Second; Results by percentile; Higher median is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Boxplot: Shows variance of measurements
Throughput, Over Time
Output Tokens per Second; Higher is better
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.
Latency
Measured by Time (seconds) to First Token
Latency
Seconds to First Tokens Chunk Received; Lower is better
Latency: Time to first token of tokens received, in seconds, after API request sent.
Median across providers: Figures represent median (P50) across all providers which support the model.
Latency Variance
Seconds to First Tokens Chunk Received; Results by percentile; Lower median is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Latency: Time to first token of tokens received, in seconds, after API request sent.
Boxplot: Shows variance of measurements
Latency, Over Time
Seconds to First Tokens Chunk Received; Lower median is better
Latency: Time to first token of tokens received, in seconds, after API request sent.
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.
Total Response Time
Time to receive 100 tokens output, calculated by latency and throughput metrics
Total Response Time
Seconds to Output 100 Tokens; Lower is better
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Throughput (tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.
Total Response Time, Over Time
Seconds to Output 100 Tokens; Lower is better
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Throughput (tokens per second).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.
Further details