All capability indexes

Healthcare & Medical Index

Measures performance on capabilities that matter most for clinical and healthcare-support work, including medical knowledge, agentic execution, non-hallucination, multimodal interpretation, reasoning, and long-context reading. Weights are derived from the relative frequency of those capabilities across the top tasks performed by clinicians and healthcare-support staff.

See representative workflows

The Artificial Analysis Healthcare & Medical Index combines performance across benchmarks chosen for clinical and healthcare-support work. Weights follow how often each capability appears in example tasks for clinicians and healthcare-support staff—grouped by shared clinical, documentation, and coordination tasks rather than by role title alone.

This composite metric prevents narrow specialization and provides a single score for tracking model performance across healthcare tasks.

Each capability sub-score is normalised to a 0-100 scale, then combined using the weights below. All underlying benchmarks are run independently by Artificial Analysis. See our Intelligence Benchmarking Methodology for how evaluations are conducted.

CategoryWeightEvaluations
Medical & Health Knowledge30%AA-Omniscience Health Accuracy
Agentic20%GDPval-AA v2
Non-Hallucination15%AA-Omniscience Non-Hallucination
Multimodal15%MMMU Pro
Reasoning15%HLE
Long-Context5%LCR

Score

Healthcare & Medical Index

Weighted across capabilities relevant to clinical and healthcare-support work · Higher is better
Reasoning models are indicated by a lightbulb icon

Healthcare & Medical Index: Capability Breakdown

Each capability area on a 0–100 scale after normalisation · Higher is better
Reasoning models are indicated by a lightbulb icon

Capability Breakdown

Healthcare & Medical Index: Medical & Health Knowledge

Models ranked by medical & health knowledge (normalised 0-100) · Higher is better
Reasoning models are indicated by a lightbulb icon

Representative Workflows

Real-world workflows that exercise the capabilities the Healthcare & Medical Index weights most heavily.

Release Date

Healthcare & Medical Index vs. Release Date

Most attractive quadrant

Speed

Healthcare & Medical Index vs. Output Speed

Healthcare & Medical Index · Output tokens per second
Most attractive quadrant

There is a trade-off between model quality and output speed, with higher intelligence models typically having lower output speed.

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Pricing

Pricing: Input and Output Prices

USD per 1M tokens (blended)
Reasoning models are indicated by a lightbulb icon

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Figures represent median (P50) measurement over the past 72 hours to reflect sustained changes in performance.

Healthcare & Medical Index vs. Price

Healthcare & Medical Index · USD per 1M tokens (blended)
Most attractive quadrant

While higher intelligence models are typically more expensive, they do not all follow the same price-quality curve.

Token Usage

Healthcare & Medical Index: Output Token Composition

Tokens used to run the evaluation

The total number of tokens used to run the evaluation, including input tokens (prompt), reasoning tokens (for reasoning models), and answer tokens (final response).

Cost

Healthcare & Medical Index: Cost Breakdown

Cost (USD) to run the evaluation

The cost to run the evaluation, calculated using the model's input and output token pricing and the number of tokens used.

Frequently Asked Questions

The Healthcare & Medical Index is a composite benchmark from Artificial Analysis that measures performance on capabilities that matter most for clinical and healthcare-support work, including medical knowledge, agentic execution, non-hallucination, multimodal interpretation, reasoning, and long-context reading. Weights are derived from the relative frequency of those capabilities across the top tasks performed by clinicians and healthcare-support staff.

The Healthcare & Medical Index is calculated as a weighted average of capability sub-scores, each normalised to a 0–100 scale. The sub-scores and their weights are: Medical & Health Knowledge (30%), Agentic (20%), Non-Hallucination (15%), Multimodal (15%), Reasoning (15%), and Long-Context (5%).

The Healthcare & Medical Index includes AA-Omniscience Health Accuracy, GDPval-AA v2, AA-Omniscience Non-Hallucination, MMMU Pro, HLE, and LCR.

MiniMax-M3 currently has the highest Healthcare & Medical Index score, with a score of 55 among models with published results. View model

A higher Healthcare & Medical Index score indicates stronger overall performance across the benchmarks that make up the index. For a specific use case, individual benchmark results may be more informative than the composite score.