Phi-4 Multimodal Instruct: Intelligence, Performance & Price Analysis
Analysis of Microsoft Azure's Phi-4 Multimodal Instruct and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more. Click on any model to compare API providers for that model. For more details including relating to our methodology, see our FAQs.
Comparison Summary
Intelligence:
Phi-4 Multimodal is of lower quality compared to average, with a MMLU score of 0.485 and a Intelligence Index across evaluations of 23.
Price:Speed:Phi-4 Multimodal is slower compared to average, with a output speed of 25.6 tokens per second.
Latency:Phi-4 Multimodal has a lower latency compared to average, taking 0.39s to receive the first token (TTFT).
Context Window:Phi-4 Multimodal has a smaller context windows than average, with a context window of 130k tokens.
Highlights
Intelligence
Artificial Analysis Intelligence Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Parallel Queries:
Prompt Length:
Comparisons to Phi-4 Multimodal
o1
GPT-4o (Nov '24)
GPT-4o mini
o3-mini (high)
GPT-4.5 (Preview)
Llama 3.3 Instruct 70B
Llama 3.1 Instruct 405B
Llama 3.1 Instruct 8B
Gemini 2.0 Pro Experimental (Feb '25)
Gemini 2.0 Flash (Feb '25)
Claude 3.5 Haiku
Claude 3.7 Sonnet (Extended Thinking)
Claude 3.7 Sonnet (Standard)
Mistral Large 2 (Nov '24)
Mistral Small 3
DeepSeek R1
DeepSeek V3
Grok 3
Grok 3 Reasoning Beta
Nova Pro
MiniMax-Text-01
QwQ 32B
Further details