AI Hardware Benchmarking & Performance Analysis
We measure real-world performance of AI accelerator systems during language model inference.
Our new real-world hardware performance test, AgentPerf, is now open for submissions — all hardware providers are welcome to get in touch via agentperf@artificialanalysis.ai.
For details regarding the methodology, see our methodology section. Benchmarks are conducted periodically, at least once per quarter, and benchmark specifications are shared in the System & Benchmark Specifications section below. For model benchmarks, see our LLM model comparison.

Calling all hardware providers
AgentPerf submissions are
now open
Peak System Output Throughput, Llama 3.3 70B
Total System Output Tokens per Second; Higher is better
Peak Output Speed per Query, Llama 3.3 70B
Output Tokens per Second per Query; Higher is better
Rental Price (ON-DEMAND)
Minimum Rental Price per GPU per Hour, USD; Lower is better
System Output Throughput at 100 tokens/s Per Query Output Speed
gpt-oss-120B (high) | System Output Throughput (Tokens per Second) at 100 tokens/s Output Speed
Peak Output Speed per Query
gpt-oss-120B (high) | Peak Output Speed per Query (Tokens per Second)
System Output Throughput vs Output Speed per Query
gpt-oss-120B (high) | System Output Throughput (Tokens per Second) vs Output Speed per Query (Tokens per Second)
8xH100 - vLLM
8xH200 - vLLM
8xB200 - TensorRT-LLM - Optimal
8xMI300X - vLLM
System Output Throughput & Output Speed per Query vs. Concurrency
gpt-oss-120B (high) | System Output Throughput (Tokens per Second) & Output Speed per Query (Tokens per Second)
8xH100 - vLLM
8xH200 - vLLM
8xB200 - TensorRT-LLM - Optimal
8xMI300X - vLLM
Throughput
Speed
Cost per Million Input and Output Tokens at 100 tokens/s Per Query Output Speed
gpt-oss-120B (high) | Cost per One Million Input and One Million Output Tokens (USD) at 100 tokens/s Output Speed
End-to-End Latency vs. Concurrency
gpt-oss-120B (high) | End-to-End Latency (s) vs. Concurrency
8xH100 - vLLM
8xH200 - vLLM
8xB200 - TensorRT-LLM - Optimal
8xMI300X - vLLM
Price per GPU Hour
Price per GPU Hour (On-Demand)
Leading cloud hyperscaler endpoints; Price in USD
Runpod
Crusoe Cloud
Digitalocean
Amazon Web Services
Google Cloud
Nebius
Microsoft Azure
Lambda
Coreweave