Menu

logo
Artificial Analysis
HOME

AI Chatbots Comparison: ChatGPT, Claude, Meta AI, Gemini and more

Which AI chatbot is best for you? We've analyzed the leading AI chatbots across every critical characteristic: intelligence, features, rate limit, context window, privacy policies and more.

We've leveraged data from our extensive benchmarking datasets to put together what we believe to be the most comprehensive comparison of AI chatbots available. Want to know which chatbot has the longest context window? The fastest speed? The best built-in Text to Image model? We've got you covered.

If you've got any questions or feedback, please reach out!

Last updated: 15 Sep 2024

Highlights

openai_small.svg - logo
ChatGPT Plus

Best Overall

ChatGPT Plus presents the best mix of model intelligence and chatbot features. With access to GPT-4o and the widest range of features from web search to image generation to data analysis capabilities, ChatGPT Plus remains the best all-around paid chatbot.

openai_small.svg - logo
ChatGPT Free

Best Free

ChatGPT Free gives limited access to OpenAI's leading frontier-class model, GPT-4o, and near-unlimited access to GPT-4o mini. Within the ~6 messages per hour users are granted with GPT-4o, ChatGPT Free has full access to the wide feature set of ChatGPT Plus and is the best free AI chatbot experience.

undefined - logo
Poe Pro

Best For Images

Poe Pro includes access to FLUX.1 [pro], the leading image generation model in the Artificial Analysis Image Arena. Poe Pro supports a wide range of both AI language models and image generation models, including other leading image models such as Ideogram v2 and Playground v3 Beta.

anthropic_small.svg - logo
Claude Pro

Best For Coding

Claude Pro wins the 'Best For Coding' award with support for Claude 3.5 Sonnet, a foundation model with leading scores in coding benchmarks, and long context capability that lets developers work with extensive codebase context when programming with Claude.

anthropic_small.svg - logo
Claude Pro

Best For Long Context

Claude Pro supports by far the largest context window of any current consumer chatbot at 200k tokens (approx 150k words), alongside flexible file upload capabilities. With access to the Claude 3.5 Sonnet model, it is the best chatbot for long context reasoning and working with large files.

openai_small.svg - logo
ChatGPT Plus

Best For Data

ChatGPT Pro combines the intelligence of GPT-4o with a Python code interpreter to deliver the best data analysis capability of all chatbots tested. Data files (eg. Excel and CSV) are uploaded directly into the code interpreter, and the model confidently writes code to analyze the data and create charts.

Compare Plan Options

openai_small.svg - logo
ChatGPT Plus
openai_small.svg - logo
ChatGPT Free
anthropic_small.svg - logo
Claude Pro
anthropic_small.svg - logo
Claude Free
google_small.svg - logo
Gemini Advanced
google_small.svg - logo
Gemini Free
undefined - logo
Poe Pro
undefined - logo
Poe Free
perplexity_small.png - logo
Perplexity Pro
perplexity_small.png - logo
Perplexity Free
microsoft_small.svg - logo
Microsoft Copilot Free
meta_small.svg - logo
Meta AI
undefined - logo
Grok
mistral_small.png - logo
Mistral Le Chat
undefined - logo
HuggingChat
placeholder - logo
Character AI Plus
placeholder - logo
Character AI Free
openai_small.svg - logo
ChatGPT Free (Logged Out)

Price

$20.00
Free
$20.00
Free
$19.99
Free
$19.99
Free
$20.00
Free
Free
Free
$8.00
Free
Free
$9.99
Free
Free

Foundation Model

GPT-4o (Aug '24)
GPT-4o (Aug '24)
Claude 3.5 Sonnet (June)
Claude 3.5 Sonnet (June)
Gemini 1.5 Pro (May)
Gemini 1.5 Flash (May)
GPT-4o (Aug '24)
GPT-4o (Aug '24)
GPT-4o (Aug '24)
Sonar 3.1 Small
GPT-4o (Aug '24)
Llama 3.1 405B
Grok-2
Mistral Large 2 (Jul '24)
Llama 3.1 70B
Character AI
Character AI
GPT-4o mini

Intelligence

77.2
77.2
76.9
76.9
-
-
77.2
77.2
77.2
50
77.2
72.1
70
73.0
65.1
25
25
71.4

Rate Limit

High
6 messages per hour
46 messages per hour
7 messages per hour
High
High
High
10 messages per hour
High
High
High
High
High
High
High
High
High
High

Speed

Fairly fast
Fairly fast
Fairly fast
Fairly fast
Fast
Fast
Fairly fast
Fairly fast
Fairly fast
Fairly fast
Slow
Fairly fast
Slow
Fairly fast
Slow
Slow
Slow
Fairly fast

Context Window

25k
5k
160k
13k
17k
5k
120k
7k
40k
1k
1k
2k
3k
40k
8k
3k
2k
5k

Image Generation

-
-
-
-
-
-
-

Tools

3/4
3/4
1/4
1/4
3/4
1/4
2/4
2/4
1/4
1/4
1/4
1/4
1/4
0/4
1/4
0/4
0/4
0/4

Voice Features

2/3
2/3
0/3
0/3
2/3
1/3
1/3
1/3
2/3
2/3
1/3
0/3
1/3
0/3
0/3
1/3
1/3
0/3

Input Capabilities

4/4
4/4
3/4
3/4
4/4
1/4
3/4
3/4
3/4
2/4
1/4
0/4
0/4
0/4
3/4
0/4
0/4
0/4

Memory

2/2
2/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
0/2

Apps

3/4
3/4
2/4
2/4
2/4
2/4
4/4
4/4
2/4
2/4
3/4
2/4
3/4
0/4
1/4
2/4
2/4
0/4

Privacy

Intelligence

Quality Index: Foundation Model Intellingence and Reasoning

Artificial Analysis Quality Index; Higher is better
Artificial Analysis Quality Index: our synthesis metric for the overall intelligence and reasoning capability of a foundation model. We assess it using a range of leading evaluation datasets, including MMLU, GPQA, Math & HumanEval. See methodology for more details.

Effective Context Window

Effective Context Window (Tokens); Higher is better
Effective Context Window: the maximum number of combined input tokens that a chatbot was able to process in our testing.

We found that the Effective Context Window available to many chatbot applications is much lower than the full context window of the underlying foundation model. Longer context windows allow users to input more information to the chatbot, including uploading longer documents.

Output Speed

Output Tokens per Second; Higher is better
Output Speed: Tokens per second received while the chatbot is generating tokens (ie. after first token has been received from the chatbot). This metric is calculated by dividing the number of output tokens by the number of seconds it took to generate them.

For this comparison between chatbots, we manually measured the outputting time. We took multiple measurements at different times to attain a representative average but are not conducting ongoing real-time testing (unlike in our language model API benchmarking where we our systems test APIs 8x per day).

Intelligence vs Feature Score

Feature Score for Chatbots: a summary score counting the features offered by chatbots across various categories (all listed in the full comparison table above):
  • Image Generation: 1 point
  • Tools: 4 points (web search, code interpreter, data analysis, output HTML)
  • Voice Features: 3 points (voice input, voice conversation, native voice-to-voice)
  • Input Capabilities: 4 points (image upload, PDF upload, Excel/CSV upload, file source connect)
  • Memory: 2 points (memory and chat history)
The maximum total value for this metric is 14 points.
Artificial Analysis Quality Index: our synthesis metric for the overall intelligence and reasoning capability of a foundation model. We assess it using a range of leading evaluation datasets, including MMLU, GPQA, Math & HumanEval. See methodology for more details.

Intelligence vs Effective Context Window

Effective Context Window: the maximum number of combined input tokens that a chatbot was able to process in our testing.

We found that the Effective Context Window available to many chatbot applications is much lower than the full context window of the underlying foundation model. Longer context windows allow users to input more information to the chatbot, including uploading longer documents.
Artificial Analysis Quality Index: our synthesis metric for the overall intelligence and reasoning capability of a foundation model. We assess it using a range of leading evaluation datasets, including MMLU, GPQA, Math & HumanEval. See methodology for more details.