Comparison Summary
Features | Model Intelligence | Price | Output tokens/s | Latency | End-to-End Response Time | |||
---|---|---|---|---|---|---|---|---|
Llama 3.1 405B | 128k | 29 | $9.50 | 16.0 | 1.03 | 32.35 | N/A | |
Llama 3.1 405B | 131k | 29 | $4.00 | 79.5 | 2.24 | 8.53 | N/A | |
![]() | Llama 3.1 405B Standard | 128k | 29 | $2.40 | 25.4 | 1.83 | 21.53 | N/A |
![]() | Llama 3.1 405B Latency Optimized | 128k | 29 | $3.00 | 78.0 | 0.46 | 6.87 | N/A |
Llama 3.1 405B Base | 128k | 29 | $1.50 | 29.6 | 0.76 | 17.66 | N/A | |
Llama 3.1 405B Vertex | 128k | 29 | $7.75 | 25.1 | 0.41 | 20.35 | N/A | |
![]() | Llama 3.1 405B | 128k | 29 | $8.00 | 25.8 | 0.47 | 19.84 | N/A |
Llama 3.1 405B | 131k | 29 | $3.00 | 75.7 | 0.55 | 7.15 | N/A | |
![]() | Llama 3.1 405B | 16k | 29 | $6.25 | 140.9 | 0.60 | 4.15 | N/A |
Llama 3.1 405B | 128k | 29 | $7.50 | 32.1 | 0.91 | 16.50 | N/A | |
Llama 3.1 405B Turbo | 131k | 29 | $3.50 | 82.5 | 0.58 | 6.64 | N/A |
Measured by Output Speed (tokens per second)
Measured by Time (seconds) to First Token
Seconds to output 500 Tokens, calculated based on time to first token, 'thinking' time for reasoning models, and output speed