Analysis of Mistral's Mixtral 8x7B Instruct and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more. Click on any model to compare API providers for that model. For more details including relating to our methodology, see our FAQs.
Mixtral 8x7B is of lower quality compared to average, with a MMLU score of 0.706 and a Quality Index across evaluations of 68.
Price:
Mixtral 8x7B is cheaper compared to average with a price of $0.50 per 1M Tokens (blended 3:1). Mixtral 8x7B Input token price: $0.50, Output token price: $0.50 per 1M Tokens.
Speed:
Mixtral 8x7B is faster compared to average, with a throughput of 103.2 tokens per second.
Latency:
Mixtral 8x7B has a lower latency compared to average, taking 0.32s to receive the first token (TTFT).
Context Window:
Mixtral 8x7B has a smaller context windows than average, with a context window of 33k tokens.
Varied metrics by ability categorization; Higher is better
General Ability (Chatbot Arena)
Reasoning & Knowledge (MMLU)
Reasoning & Knowledge (MT Bench)
Coding (HumanEval)
OpenAI's GPT-4 is no longer the clear quality leader with the launch of other models including Anthropic's Opus and Mistral's Large. Models have also been released which rival GPT-3.5 performance including Gemini Pro, Mixtral 8x7B and DBRX.
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Throughput (tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.
Quality vs. Context window, Input token price
Quality: General reasoning index, Context window: Tokens limit, Input Price: USD per 1M Tokens
Open AI's GPT-4 Turbo and Anthropic's Claude models stand out as leaders in offering large context windows. A trade off of quality and context window size exists between GPT-4 Turbo and Claude 2.1, Claude 2.1 is also marginally cheaper.
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Context window
Context window: Tokens limit; Higher is better
Open AI's GPT-4 Turbo and Anthropic's Claude models, particuarly Claude 2.1, stand out as leaders in offering large context windows.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Higher quality models are typically more expensive. However, model quality varies significantly and some open source models now achieve very high quality.
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.
Pricing: Input and Output prices
USD per 1M Tokens
Input price
Output price
Prices vary considerably, including between input and output token price. GPT-4 stands out as orders of magnitude higher priced than the cheapest models.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Median across providers: Figures represent median (P50) across all providers which support the model.
The speed difference between the fastest and slowest models is >3X. There is not always a correlation between parameter size and speed, or between price and speed.
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Throughput (tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Throughput (tokens per second).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.