Healthcare & Medical Index
Measures performance on capabilities that matter most for clinical and healthcare-support work, including medical knowledge, agentic execution, non-hallucination, multimodal interpretation, reasoning, and long-context reading. Weights are derived from the relative frequency of those capabilities across the top tasks performed by clinicians and healthcare-support staff.
See representative workflowsThe Artificial Analysis Healthcare & Medical Index combines performance across benchmarks chosen for clinical and healthcare-support work. Weights follow how often each capability appears in example tasks for clinicians and healthcare-support staff—grouped by shared clinical, documentation, and coordination tasks rather than by role title alone.
This composite metric prevents narrow specialization and provides a single score for tracking model performance across healthcare tasks.
Each capability sub-score is normalised to a 0-100 scale, then combined using the weights below. All underlying benchmarks are run independently by Artificial Analysis. See our Intelligence Benchmarking Methodology for how evaluations are conducted.
| Category | Weight | Evaluations |
|---|---|---|
| Medical & Health Knowledge | 30% | AA-Omniscience Health Accuracy |
| Agentic | 20% | GDPval-AA v2 |
| Non-Hallucination | 15% | AA-Omniscience Non-Hallucination |
| Multimodal | 15% | MMMU Pro |
| Reasoning | 15% | HLE |
| Long-Context | 5% | LCR |
Score
Healthcare & Medical Index
Healthcare & Medical Index: Capability Breakdown
Capability Breakdown
Healthcare & Medical Index: Medical & Health Knowledge
Representative Workflows
Real-world workflows that exercise the capabilities the Healthcare & Medical Index weights most heavily.
Release Date
Healthcare & Medical Index vs. Release Date
Speed
Healthcare & Medical Index vs. Output Speed
Pricing
Pricing: Input and Output Prices
Healthcare & Medical Index vs. Price
Token Usage
Healthcare & Medical Index: Output Token Composition
Cost
Healthcare & Medical Index: Cost Breakdown
Frequently Asked Questions
The Healthcare & Medical Index is a composite benchmark from Artificial Analysis that measures performance on capabilities that matter most for clinical and healthcare-support work, including medical knowledge, agentic execution, non-hallucination, multimodal interpretation, reasoning, and long-context reading. Weights are derived from the relative frequency of those capabilities across the top tasks performed by clinicians and healthcare-support staff.
The Healthcare & Medical Index is calculated as a weighted average of capability sub-scores, each normalised to a 0–100 scale. The sub-scores and their weights are: Medical & Health Knowledge (30%), Agentic (20%), Non-Hallucination (15%), Multimodal (15%), Reasoning (15%), and Long-Context (5%).
The Healthcare & Medical Index includes AA-Omniscience Health Accuracy, GDPval-AA v2, AA-Omniscience Non-Hallucination, MMMU Pro, HLE, and LCR.
MiniMax-M3 currently has the highest Healthcare & Medical Index score, with a score of 55 among models with published results. View model
A higher Healthcare & Medical Index score indicates stronger overall performance across the benchmarks that make up the index. For a specific use case, individual benchmark results may be more informative than the composite score.