AI Hardware Benchmarking & Performance Analysis
We measure real-world performance of AI accelerator systems during language model inference.
For language model intelligence benchmarks, or API performance benchmarks, see language model comparisons.
System Load Test (AA-SLT)
Our original hardware benchmark, covering a wide range of systems. Read the methodology
Highlights
Throughput
System Output Throughput at 100 tokens/s Per Query Output Speed
Output Speed
Peak Output Speed per Query
Throughput vs Speed
System Output Throughput vs. Output Speed per Query
System Output Throughput & Output Speed per Query vs. Concurrency
Cost
Cost per Million Input and Output Tokens at 100 tokens/s Per Query Output Speed
Concurrency
End-to-End Latency vs. Concurrency
Pricing
Price per GPU Hour (On-Demand)
Frequently Asked Questions
For the current Artificial Analysis System Load Test (AA-SLT), NVIDIA's B200 is the most performant accelerator for LLM inference. It leads on peak throughput and output speed per query, though the right choice can still vary by model, deployment goal and budget.
NVIDIA's B200 currently powers the highest-throughput result in the current Artificial Analysis System Load Test (AA-SLT). The top benchmark is 8xB200 (SXM) serving gpt-oss-120b (high), reaching 92,909 output tokens per second at peak throughput.
NVIDIA's B200 currently powers the fastest single-query result in the current Artificial Analysis System Load Test (AA-SLT). The top benchmark is 8xB200 (SXM) serving gpt-oss-120b (high), reaching 403 output tokens per second per query.
8xB200 (SXM) for gpt-oss-120b (high) currently has the best cost efficiency in the current Artificial Analysis System Load Test (AA-SLT) at $0.19. Artificial Analysis compares systems using cost per one million input and one million output tokens at a model-specific reference speed, so the most cost-efficient hardware depends on both the model and the target output speed.
In the current Artificial Analysis System Load Test (AA-SLT), DeepSeek R1 0528 (May '25) works best on 8xB200 (SXM) with NVIDIA's B200, reaching 45,677 output tokens per second at peak throughput, Llama 4 Maverick works best on 8xB200 (SXM) with NVIDIA's B200, reaching 48,198 output tokens per second at peak throughput, and gpt-oss-120b (high) works best on 8xB200 (SXM) with NVIDIA's B200, reaching 92,909 output tokens per second at peak throughput.