Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median: Figures represent median (P50) measurement over the past 72 hours to reflect sustained changes in performance.
Notes: Llama 4 Maverick (FP8), Parasail: 1m context, Llama 4 Maverick, Cerebras: 32k context, Llama 4 Maverick, Amazon: 128k context, Llama 4 Maverick Vertex, Google: 524k context, Llama 4 Maverick (FP8), Azure: 128k context, Llama 4 Maverick (Base), Fireworks: 1m context, Llama 4 Maverick (FP8), Deepinfra: 1m context, Llama 4 Maverick (Turbo, FP8), Deepinfra: 8k context, Llama 4 Maverick (FP8), Novita: 1m context, Llama 4 Maverick (FP8), GMI: 1m context, Llama 4 Maverick, Groq: 131k context, Llama 4 Maverick, SambaNova: 131k context, Llama 4 Maverick, Together.ai: 1m context