Microsoft Azure: Models Intelligence, Performance & Price
Analysis of Microsoft Azure's models across key metrics including quality, price, output speed, latency, context window & more. This analysis is intended to support you in choosing the best model provided by Microsoft Azure for your use-case. For more details including relating to our methodology, see our FAQs. Models analyzed: o1, o3-mini, o1-preview, o1-mini, GPT-4o (Aug '24), GPT-4o (May '24), GPT-4o (Nov '24), GPT-4o mini, o3-mini (high), Llama 3.3 70B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Mistral Large 2 (Nov '24), Mistral Large 2 (Jul '24), DeepSeek R1, Phi-4, Phi-3 Medium 14B, Phi-4 Multimodal, Phi-4 Mini, Command-R+ (Apr '24), Command-R (Mar '24), Jamba 1.5 Large, Jamba 1.5 Mini, GPT-4 Turbo, Llama 3 70B, Llama 3 8B, Mistral Small (Feb '24), Mistral Large (Feb '24), and Jamba Instruct.
Link:
Azure Model Comparison Summary
Intelligence:
o3-mini (high) and
o3-mini are the highest quality models offered by Azure, followed by
o1,
DeepSeek R1 &
o1-mini.Output Speed (tokens/s):
Llama 3.1 8B (217 t/s) and
GPT-4o mini (147 t/s) are the fastest models offered by Azure, followed by
GPT-4o (May '24),
GPT-4o (Nov '24) &
GPT-4o (Aug '24).Latency (seconds):
Llama 3.1 8B (0.29s) and
Phi-4 Mini (0.35s) are the lowest latency models offered by Azure, followed by
Phi-4 Multimodal,
Llama 3 8B &
Mistral Small (Feb '24).Blended Price ($/M tokens):
DeepSeek R1 ($0.00) and
Phi-4 Multimodal ($0.00) are the cheapest models offered by Azure, followed by
Phi-4 Mini,
Phi-4 &
Jamba 1.5 Mini.Context Window Size:
Jamba 1.5 Large (256k) and
Jamba 1.5 Mini (256k) are the largest context window models offered by Azure, followed by
Jamba Instruct,
o1 &
o3-mini.







Highlights
Intelligence
Artificial Analysis Intelligence Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Parallel Queries:
Prompt Length:
Features | Model Intelligence | Price | Output tokens/s | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
![]() | o3-mini (high) | 200k | 66 | $1.93 | 12.9 | 80.53 | |
![]() | o3-mini | 200k | 63 | $1.93 | 33.9 | 30.62 | |
![]() | o1 | 200k | 62 | $26.25 | 35.0 | 30.26 | |
![]() | ![]() DeepSeek R1 | 128k | 60 | $0.00 | 13.5 | 0.97 | |
![]() | o1-mini | 128k | 54 | $2.12 | 61.2 | 17.11 | |
![]() | GPT-4o (Nov '24) | 128k | 41 | $4.38 | 99.1 | 0.97 | |
![]() | Llama 3.3 70B | 128k | 41 | $0.71 | 48.2 | 0.44 | |
![]() | GPT-4o (Aug '24) | 128k | 41 | $4.38 | 96.0 | 0.80 | |
![]() | GPT-4o (May '24) | 128k | 41 | $7.50 | 130.4 | 0.88 | |
![]() | Llama 3.1 405B | 128k | 40 | $8.00 | 31.2 | 0.48 | |
![]() | Phi-4 | 16k | 40 | $0.22 | 35.4 | 0.48 | |
![]() | ![]() Mistral Large 2 (Nov '24) | 128k | 38 | $3.00 | 36.2 | 0.54 | |
![]() | ![]() Mistral Large 2 (Jul '24) | 128k | 37 | $3.00 | 35.3 | 0.52 | |
![]() | GPT-4o mini | 128k | 36 | $0.26 | 147.2 | 0.91 | |
![]() | Llama 3.1 70B | 128k | 35 | $2.90 | 62.4 | 0.43 | |
![]() | ![]() Jamba 1.5 Large | 256k | 29 | $3.50 | 51.5 | 0.66 | |
![]() | Llama 3 70B | 8k | 27 | $2.90 | 18.9 | 0.77 | |
![]() | ![]() Mistral Large (Feb '24) | 33k | 26 | $6.00 | 39.9 | 0.50 | |
![]() | Phi-3 Medium 14B | 128k | 25 | $0.30 | 49.6 | 0.42 | |
![]() | Llama 3.1 8B | 128k | 24 | $0.38 | 216.7 | 0.29 | |
![]() | Phi-4 Multimodal | 128k | 23 | $0.00 | 23.3 | 0.35 | |
![]() | Phi-4 Mini | 128k | 23 | $0.00 | 52.7 | 0.35 | |
![]() | ![]() Mistral Small (Feb '24) | 33k | 23 | $1.50 | 54.6 | 0.39 | |
![]() | Llama 3 8B | 8k | 21 | $0.38 | 73.8 | 0.39 | |
![]() | ![]() Command-R+ (Apr '24) | 128k | 20 | $6.00 | 50.6 | 0.57 | |
![]() | ![]() Jamba 1.5 Mini | 256k | 18 | $0.25 | 82.7 | 0.48 | |
![]() | ![]() Jamba Instruct | 256k | 16 | $0.55 | 76.5 | 0.53 | |
![]() | ![]() Command-R (Mar '24) | 128k | 15 | $0.75 | 82.1 | 0.44 | |
![]() | o1-preview | 128k | $28.88 | 30.4 | 33.92 | ||
![]() | GPT-4 Turbo | 128k | $15.00 | 39.4 | 1.67 |
Key definitions
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 72 hours of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.