Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

AA-Omniscience: Knowledge and Hallucination Benchmark

Name: AA-Omniscience Accuracy
Creator: Artificial Analysis
License: https://creativecommons.org/licenses/by/4.0/

A benchmark measuring factual recall and hallucination across various economically relevant domains.

Background

AA-Omniscience is a knowledge and hallucination benchmark that rewards accuracy, punishes bad guesses and provides a comprehensive view of which models produce factually reliable outputs across different domains.

The benchmark contains 6,000 questions across 6 major domains, derived from authoritative academic and industry sources and generated automatically using an LLM-based question generation agent to ensure unambiguity, scalability and factual precision

Methodology

All evaluations are conducted independently by Artificial Analysis. More information can be found on our Intelligence Benchmarking Methodology page.

Publication

View on arXiv

AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models

Declan Jackson, William Keating, George Cameron, Micah Hill-Smith.

Highlights

Gemini 3 Pro Preview (high) scores the highest on Omniscience with a score of 13, followed by Claude Opus 4.5 (Reasoning) with a score of 10, and Gemini 3 Flash Preview (Reasoning) with a score of 8
Gemini 3 Pro Preview (high) scores the highest on Omniscience Accuracy with a score of 54%, followed by Gemini 3 Flash Preview (Reasoning) with a score of 52%, and Gemini 3 Flash Preview (Non-reasoning) with a score of 47%
Claude 4.5 Haiku (Non-reasoning) scores the lowest on Omniscience Hallucination Rate with a score of 25%, followed by Grok 3 mini Reasoning (high) with a score of 25%, and Claude 4.5 Haiku (Reasoning) with a score of 26%

AA-Omniscience Index: Results

AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.

Independently conducted by Artificial Analysis

AA-Omniscience Index

AA-Omniscience Index vs. Artificial Analysis Intelligence Index

AA-Omniscience Index; Artificial Analysis Intelligence Index

Most attractive quadrant

Alibaba

Amazon

Anthropic

DeepSeek

Google

Kimi

Korea Telecom

KwaiKAT

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

Naver

NVIDIA

OpenAI

TII UAE

xAI

Xiaomi

Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

AA-Omniscience Accuracy

AA-Omniscience Accuracy (higher is better) measures the proportion of correctly answered questions out of all questions, regardless of whether the model chooses to answer

{"@context":"https://schema.org","@type":"Dataset","name":"AA-Omniscience Accuracy","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"AA-Omniscience Accuracy (higher is better) measures the proportion of correctly answered questions out of all questions, regardless of whether the model chooses to answer","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"modelName,omniscienceAccuracy,detailsUrl,isLabClaimedValue\nGemini 3 Pro Preview (high),0.5365,/models/gemini-3-pro/providers,false\nGemini 3 Flash,0.5186666666666667,/models/gemini-3-flash-reasoning/providers,false\nGemini 3 Flash,0.4715,/models/gemini-3-flash/providers,false\nGemini 3 Pro Preview (low),0.46016666666666667,/models/gemini-3-pro-low/providers,false\nClaude Opus 4.5,0.43116666666666664,/models/claude-opus-4-5-thinking/providers,false\nGPT-5.2 (xhigh),0.4135,/models/gpt-5-2/providers,false\nGrok 4,0.39566666666666667,/models/grok-4/providers,false\nClaude Opus 4.5,0.389,/models/claude-opus-4-5/providers,false\nGPT-5 (high),0.38616666666666666,/models/gpt-5/providers,false\nGPT-5.2 Codex (xhigh),0.3845,/models/gpt-5-2-codex/providers,false\nGemini 2.5 Pro,0.37483333333333335,/models/gemini-2-5-pro/providers,false\nGPT-5 (medium),0.37383333333333335,/models/gpt-5-medium/providers,false\no3,0.3725,/models/o3/providers,false\nGPT-5 Codex (high),0.371,/models/gpt-5-codex/providers,false\nDeepSeek V3.2 Speciale,0.3675,/models/deepseek-v3-2-speciale/providers,false\nGPT-5 (low),0.3635,/models/gpt-5-low/providers,false\nClaude 4.1 Opus,0.35933333333333334,/models/claude-4-1-opus-thinking/providers,false\nGPT-5.2 (medium),0.35533333333333333,/models/gpt-5-2-medium/providers,false\nGPT-5.1 (high),0.353,/models/gpt-5-1/providers,false\no1,0.3313333333333333,/models/o1/providers,false\nKimi K2.5,0.3265,/models/kimi-k2-5/providers,false\nDeepSeek V3.2,0.32216666666666666,/models/deepseek-v3-2-reasoning/providers,false\nClaude 4.5 Sonnet,0.309,/models/claude-4-5-sonnet-thinking/providers,false\nDeepSeek R1 0528,0.29283333333333333,/models/deepseek-r1/providers,false\nKimi K2 Thinking,0.29233333333333333,/models/kimi-k2-thinking/providers,false\nDeepSeek R1 (Jan),0.2921666666666667,/models/deepseek-r1-0120/providers,false\nHermes 4 405B,0.2911666666666667,/models/hermes-4-llama-3-1-405b-reasoning/providers,false\nCogito v2.1,0.2875,/models/cogito-v2-1-reasoning/providers,false\nQwen3 Max Thinking,0.28733333333333333,/models/qwen3-max-thinking/providers,false\nGLM-4.7,0.2843333333333333,/models/glm-4-7/providers,false\nDeepSeek V3.1,0.281,/models/deepseek-v3-1-reasoning/providers,false\nGPT-5.2,0.2795,/models/gpt-5-2-non-reasoning/providers,false\nGPT-5.1,0.278,/models/gpt-5-1-non-reasoning/providers,false\nGrok 3,0.27366666666666667,/models/grok-3/providers,false\nGPT-5 (minimal),0.27216666666666667,/models/gpt-5-minimal/providers,false\nDeepSeek V3.1 Terminus,0.27166666666666667,/models/deepseek-v3-1-terminus-reasoning/providers,false\nClaude 3.7 Sonnet,0.2698333333333333,/models/claude-3-7-sonnet/providers,false\nGemini 2.5 Flash (Sep),0.2698333333333333,/models/gemini-2-5-flash-preview-09-2025-reasoning/providers,false\nDeepSeek V3.2 Exp,0.26966666666666667,/models/deepseek-v3-2-reasoning-0925/providers,false\nClaude 4.5 Sonnet,0.2693333333333333,/models/claude-4-5-sonnet/providers,false\nClaude 3.7 Sonnet,0.26766666666666666,/models/claude-3-7-sonnet-thinking/providers,false\nQwen3 Max Thinking (Preview),0.26616666666666666,/models/qwen3-max-thinking-preview/providers,false\nMiMo-V2-Flash,0.2633333333333333,/models/mimo-v2-flash-reasoning/providers,false\nGPT-4.1,0.2608333333333333,/models/gpt-4-1/providers,false\nKimi K2,0.258,/models/kimi-k2/providers,false\nGemini 2.5 Flash (Sep),0.25766666666666665,/models/gemini-2-5-flash-preview-09-2025/providers,false\nGLM-4.6,0.25483333333333336,/models/glm-4-6-reasoning/providers,false\nGemini 2.5 Flash,0.2535,/models/gemini-2-5-flash/providers,false\nHermes 4 405B,0.2495,/models/hermes-4-llama-3-1-405b/providers,false\nGemini 2.5 Flash,0.249,/models/gemini-2-5-flash-reasoning/providers,false\nDeepSeek V3 (Dec),0.24066666666666667,/models/deepseek-v3/providers,false\nKimi K2 0905,0.24033333333333334,/models/kimi-k2-0905/providers,false\nGLM-4.5,0.23866666666666667,/models/glm-4.5/providers,false\nDoubao Seed Code,0.23866666666666667,/models/doubao-seed-code/providers,false\nMistral Large 3,0.23683333333333334,/models/mistral-large-3/providers,false\nLlama 4 Maverick,0.23516666666666666,/models/llama-4-maverick/providers,false\nQwen3 Max (Preview),0.23516666666666666,/models/qwen3-max-preview/providers,false\nGrok 4.1 Fast,0.235,/models/grok-4-1-fast-reasoning/providers,false\no4-mini (high),0.2335,/models/o4-mini/providers,false\nQwen3 Max,0.2335,/models/qwen3-max/providers,false\nERNIE 5.0 Thinking Preview,0.23183333333333334,/models/ernie-5-0-thinking-preview/providers,false\nDeepSeek V3 0324,0.23166666666666666,/models/deepseek-v3-0324/providers,false\nGPT-5.1 Codex (high),0.23133333333333334,/models/gpt-5-1-codex/providers,false\nGPT-5 mini (high),0.22966666666666666,/models/gpt-5-mini/providers,false\nGLM-4.7,0.22916666666666666,/models/glm-4-7-non-reasoning/providers,false\nDeepSeek V3.2,0.22766666666666666,/models/deepseek-v3-2/providers,false\nGrok Code Fast 1,0.2275,/models/grok-code-fast-1/providers,false\nDeepSeek V3.1 Terminus,0.22616666666666665,/models/deepseek-v3-1-terminus/providers,false\nDeepSeek V3.1,0.223,/models/deepseek-v3-1/providers,false\nHermes 4 70B,0.223,/models/hermes-4-llama-3-1-70b-reasoning/providers,false\nKimi K2.5,0.22166666666666668,/models/kimi-k2-5-non-reasoning/providers,false\nGemini 2.0 Flash,0.22116666666666668,/models/gemini-2-0-flash/providers,false\nQwen3 235B A22B 2507,0.22116666666666668,/models/qwen3-235b-a22b-instruct-2507-reasoning/providers,false\nMiniMax-M2.1,0.22066666666666668,/models/minimax-m2-1/providers,false\nDeepSeek V3.2 Exp,0.22066666666666668,/models/deepseek-v3-2-0925/providers,false\nGrok 4 Fast,0.22033333333333333,/models/grok-4-fast-reasoning/providers,false\nLlama 3.1 405B,0.21833333333333332,/models/llama-3-1-instruct-405b/providers,false\nGPT-5.1 Codex mini (high),0.21733333333333332,/models/gpt-5-1-codex-mini/providers,false\nClaude 4 Sonnet,0.2145,/models/claude-4-sonnet/providers,false\nGPT-5 mini (medium),0.21233333333333335,/models/gpt-5-mini-medium/providers,false\nClaude 4 Sonnet,0.2105,/models/claude-4-sonnet-thinking/providers,false\nNova 2.0 Pro Preview (low),0.20983333333333334,/models/nova-2-0-pro-reasoning-low/providers,false\nNova 2.0 Pro Preview (medium),0.2095,/models/nova-2-0-pro-reasoning-medium/providers,false\nGPT-4o (Aug),0.20883333333333334,/models/gpt-4o-2024-08-06/providers,false\nMiniMax-M2,0.20833333333333334,/models/minimax-m2/providers,false\nMiniMax M1 80k,0.20566666666666666,/models/minimax-m1-80k/providers,false\nQwen3 VL 235B A22B,0.2045,/models/qwen3-vl-235b-a22b-reasoning/providers,false\nGLM-4.6,0.20266666666666666,/models/glm-4-6/providers,false\no3-mini (high),0.20166666666666666,/models/o3-mini-high/providers,false\nGLM-4.5V,0.20133333333333334,/models/glm-4-5v-reasoning/providers,false\nMagistral Medium 1.2,0.20083333333333334,/models/magistral-medium-2509/providers,false\ngpt-oss-120B (high),0.20016666666666666,/models/gpt-oss-120b/providers,false\nDevstral 2,0.198,/models/devstral-2/providers,false\nRing-1T,0.19483333333333333,/models/ring-1t/providers,false\nMagistral Medium 1,0.1945,/models/magistral-medium/providers,false\nQwen3 VL 235B A22B,0.19216666666666668,/models/qwen3-vl-235b-a22b-instruct/providers,false\nLlama Nemotron Ultra,0.192,/models/llama-3-1-nemotron-ultra-253b-v1-reasoning/providers,false\nQwen3 VL 8B,0.19166666666666668,/models/qwen3-vl-8b-instruct/providers,false\nMistral Large 2 (Nov),0.191,/models/mistral-large-2/providers,false\nNova Premier,0.18983333333333333,/models/nova-premier/providers,false\nQwen3 VL 8B,0.18983333333333333,/models/qwen3-vl-8b-reasoning/providers,false\nMi:dm K 2.5 Pro,0.18966666666666668,/models/mi-dm-k-2-5-pro-dec28/providers,false\nMistral Medium 3.1,0.1895,/models/mistral-medium-3-1/providers,false\nLlama 3.1 70B,0.18866666666666668,/models/llama-3-1-instruct-70b/providers,false\nGPT-4.1 mini,0.18733333333333332,/models/gpt-4-1-mini/providers,false\nLing-1T,0.18733333333333332,/models/ling-1t/providers,false\nGPT-4o (Nov),0.18666666666666668,/models/gpt-4o/providers,false\nSolar Pro 2,0.18533333333333332,/models/solar-pro-2-reasoning/providers,false\nMi:dm K 2.5 Pro Preview,0.18516666666666667,/models/midm-250-pro-rsnsft/providers,false\nDeepSeek R1 Distill Llama 70B,0.185,/models/deepseek-r1-distill-llama-70b/providers,false\nKAT-Coder-Pro V1,0.18466666666666667,/models/kat-coder-pro-v1/providers,false\nINTELLECT-3,0.18366666666666667,/models/intellect-3/providers,false\nGPT-5 nano (high),0.18283333333333332,/models/gpt-5-nano/providers,false\nQwen3 Next 80B A3B,0.18216666666666667,/models/qwen3-next-80b-a3b-reasoning/providers,false\nDevstral Medium,0.18166666666666667,/models/devstral-medium/providers,false\ngpt-oss-120B (low),0.1815,/models/gpt-oss-120b-low/providers,false\nERNIE 4.5 300B A47B,0.17833333333333334,/models/ernie-4-5-300b-a47b/providers,false\nHermes 4 70B,0.17833333333333334,/models/hermes-4-llama-3-1-70b/providers,false\nLlama 3.3 70B,0.17833333333333334,/models/llama-3-3-instruct-70b/providers,false\nNova 2.0 Omni (low),0.1775,/models/nova-2-0-omni-reasoning-low/providers,false\nQwen3 235B,0.17666666666666667,/models/qwen3-235b-a22b-instruct-reasoning/providers,false\nQwen3 235B 2507,0.17583333333333334,/models/qwen3-235b-a22b-instruct-2507/providers,false\nMagistral Small 1,0.17416666666666666,/models/magistral-small/providers,false\nNova 2.0 Lite (medium),0.17366666666666666,/models/nova-2-0-lite-reasoning-medium/providers,false\nSeed-OSS-36B-Instruct,0.17333333333333334,/models/seed-oss-36b-instruct/providers,false\nMistral Medium 3,0.173,/models/mistral-medium-3/providers,false\nQwen3 235B,0.173,/models/qwen3-235b-a22b-instruct/providers,false\nGLM-4.5V,0.173,/models/glm-4-5v/providers,false\nNova 2.0 Omni (medium),0.172,/models/nova-2-0-omni-reasoning-medium/providers,false\nLlama 3 70B,0.17166666666666666,/models/llama-3-instruct-70b/providers,false\nGemini 2.5 Flash-Lite,0.17133333333333334,/models/gemini-2-5-flash-lite-reasoning/providers,false\nGemini 2.5 Flash-Lite (Sep),0.17133333333333334,/models/gemini-2-5-flash-lite-preview-09-2025-reasoning/providers,false\nGPT-5 mini (minimal),0.171,/models/gpt-5-mini-minimal/providers,false\nSolar Open 100B,0.17016666666666666,/models/solar-open-100b-reasoning/providers,false\nK2-V2 (high),0.17,/models/k2-v2/providers,false\nQwen2.5 72B,0.16933333333333334,/models/qwen2-5-72b-instruct/providers,false\nNova 2.0 Lite (low),0.1675,/models/nova-2-0-lite-reasoning-low/providers,false\nQwen3 Next 80B A3B,0.16716666666666666,/models/qwen3-next-80b-a3b-instruct/providers,false\nGLM-4.6V,0.16666666666666666,/models/glm-4-6v/providers,false\nApriel-v1.6-15B-Thinker,0.166,/models/apriel-v1-6-15b-thinker/providers,false\nK2-V2 (medium),0.16566666666666666,/models/k2-v2-medium/providers,false\nNVIDIA Nemotron 3 Nano,0.16516666666666666,/models/nvidia-nemotron-3-nano-30b-a3b-reasoning/providers,false\nQwen3 32B,0.165,/models/qwen3-32b-instruct-reasoning/providers,false\nGrok 4 Fast,0.16483333333333333,/models/grok-4-fast/providers,false\nGPT-5 nano (medium),0.16383333333333333,/models/gpt-5-nano-medium/providers,false\nQwen3 VL 32B,0.16366666666666665,/models/qwen3-vl-32b-reasoning/providers,false\nNova Pro,0.1635,/models/nova-pro/providers,false\nMiMo-V2-Flash,0.16316666666666665,/models/mimo-v2-flash/providers,false\nLlama 3.1 Nemotron 70B,0.16266666666666665,/models/llama-3-1-nemotron-instruct-70b/providers,false\nClaude 4.5 Haiku,0.16183333333333333,/models/claude-4-5-haiku-reasoning/providers,false\nLlama Nemotron Super 49B v1.5,0.16083333333333333,/models/llama-nemotron-super-49b-v1-5-reasoning/providers,false\nQwen3 VL 30B A3B,0.15933333333333333,/models/qwen3-vl-30b-a3b-reasoning/providers,false\nNova 2.0 Pro Preview,0.15866666666666668,/models/nova-2-0-pro/providers,false\nGrok 4.1 Fast,0.158,/models/grok-4-1-fast/providers,false\nK-EXAONE,0.1565,/models/k-exaone/providers,false\nHyperCLOVA X SEED Think (32B),0.15616666666666668,/models/hyperclova-x-seed-think-32b/providers,false\nRing-flash-2.0,0.155,/models/ring-flash-2-0/providers,false\nQwen3 30B A3B 2507,0.15466666666666667,/models/qwen3-30b-a3b-2507-reasoning/providers,false\nQwen3 30B,0.1535,/models/qwen3-30b-a3b-instruct-reasoning/providers,false\nApriel-v1.5-15B-Thinker,0.153,/models/apriel-v1-5-15b-thinker/providers,false\nGLM-4.7-Flash,0.15266666666666667,/models/glm-4-7-flash/providers,false\nK2 Think V2,0.15216666666666667,/models/k2-think-v2/providers,false\nGLM-4.6V,0.152,/models/glm-4-6v-reasoning/providers,false\nK2-V2 (low),0.15133333333333332,/models/k2-v2-low/providers,false\nQwen3 Coder 30B A3B,0.15133333333333332,/models/qwen3-coder-30b-a3b-instruct/providers,false\nCommand A,0.15066666666666667,/models/command-a/providers,false\nGLM-4.5-Air,0.1505,/models/glm-4-5-air/providers,false\nSolar Pro 2,0.15033333333333335,/models/solar-pro-2/providers,false\nDevstral Small 2,0.14983333333333335,/models/devstral-small-2/providers,false\nDevstral Small (May),0.14966666666666667,/models/devstral-small-2505/providers,false\nGemini 2.5 Flash-Lite,0.14833333333333334,/models/gemini-2-5-flash-lite/providers,false\nQwen3 VL 30B A3B,0.1475,/models/qwen3-vl-30b-a3b-instruct/providers,false\ngpt-oss-20B (high),0.1465,/models/gpt-oss-20b/providers,false\nGrok 3 mini Reasoning (high),0.145,/models/grok-3-mini-reasoning/providers,false\nQwen3 Coder 480B,0.14433333333333334,/models/qwen3-coder-480b-a35b-instruct/providers,false\nLlama 4 Scout,0.1435,/models/llama-4-scout/providers,false\nMistral Small 3.1,0.14316666666666666,/models/mistral-small-3-1/providers,false\nQwen3 30B A3B 2507,0.143,/models/qwen3-30b-a3b-2507/providers,false\nMistral Small 3.2,0.14266666666666666,/models/mistral-small-3-2/providers,false\nOlmo 3.1 32B Think,0.14166666666666666,/models/olmo-3-1-32b-think/providers,false\nQwen3 14B,0.1415,/models/qwen3-14b-instruct-reasoning/providers,false\nMotif-2-12.7B,0.14116666666666666,/models/motif-2-12-7b/providers,false\nQwen3 Omni 30B A3B,0.14066666666666666,/models/qwen3-omni-30b-a3b-reasoning/providers,false\nQwen3 VL 32B,0.1405,/models/qwen3-vl-32b-instruct/providers,false\nOlmo 3 32B Think,0.13983333333333334,/models/olmo-3-32b-think/providers,false\nDevstral Small,0.1395,/models/devstral-small/providers,false\nQwen3 Omni 30B A3B,0.13833333333333334,/models/qwen3-omni-30b-a3b-instruct/providers,false\nLing-flash-2.0,0.13733333333333334,/models/ling-flash-2-0/providers,false\ngpt-oss-20B (low),0.13683333333333333,/models/gpt-oss-20b-low/providers,false\nFalcon-H1R-7B,0.136,/models/falcon-h1r-7b/providers,false\nGranite 4.0 H Small,0.1345,/models/granite-4-0-h-small/providers,false\nClaude 4.5 Haiku,0.13416666666666666,/models/claude-4-5-haiku/providers,false\nGemini 2.5 Flash-Lite (Sep),0.1335,/models/gemini-2-5-flash-lite-preview-09-2025/providers,false\nNVIDIA Nemotron Nano 12B v2 VL,0.1335,/models/nvidia-nemotron-nano-12b-v2-vl-reasoning/providers,false\nEXAONE 4.0 32B,0.13333333333333333,/models/exaone-4-0-32b-reasoning/providers,false\nNova 2.0 Lite,0.13283333333333333,/models/nova-2-0-lite/providers,false\nReka Flash 3,0.13183333333333333,/models/reka-flash-3/providers,false\nNVIDIA Nemotron 3 Nano,0.12966666666666668,/models/nvidia-nemotron-3-nano-30b-a3b/providers,false\nPhi-4,0.12933333333333333,/models/phi-4/providers,false\nLlama 3.3 Nemotron Super 49B,0.12916666666666668,/models/llama-3-3-nemotron-super-49b/providers,false\nQwen3 4B 2507,0.12883333333333333,/models/qwen3-4b-2507-instruct-reasoning/providers,false\nGPT-4.1 nano,0.12783333333333333,/models/gpt-4-1-nano/providers,false\nMagistral Small 1.2,0.12766666666666668,/models/magistral-small-2509/providers,false\nK-EXAONE,0.12733333333333333,/models/k-exaone-non-reasoning/providers,false\nQwen3 8B,0.12733333333333333,/models/qwen3-8b-instruct-reasoning/providers,false\nClaude 3.5 Haiku,0.126,/models/claude-3-5-haiku/providers,false\nGLM-4.7-Flash,0.12433333333333334,/models/glm-4-7-flash-non-reasoning/providers,false\nNova 2.0 Omni,0.12383333333333334,/models/nova-2-0-omni/providers,false\nQwen3 14B,0.12366666666666666,/models/qwen3-14b-instruct/providers,false\nStep3 VL 10B,0.12266666666666666,/models/step3-vl-10b/providers,false\nMinistral 3 14B,0.12133333333333333,/models/ministral-3-14b/providers,false\nGemma 3 27B,0.12116666666666667,/models/gemma-3-27b/providers,false\nQwen3 4B 2507,0.11866666666666667,/models/qwen3-4b-2507-instruct/providers,false\nMinistral 3 8B,0.11816666666666667,/models/ministral-3-8b/providers,false\nLlama Nemotron Super 49B v1.5,0.115,/models/llama-nemotron-super-49b-v1-5/providers,false\nQwen3 VL 4B,0.115,/models/qwen3-vl-4b-reasoning/providers,false\nGPT-5 nano (minimal),0.11366666666666667,/models/gpt-5-nano-minimal/providers,false\nDeepSeek R1 0528 Qwen3 8B,0.1125,/models/deepseek-r1-qwen3-8b/providers,false\nOlmo 3.1 32B Instruct,0.1115,/models/olmo-3-1-32b-instruct/providers,false\nQwen3 30B,0.11133333333333334,/models/qwen3-30b-a3b-instruct/providers,false\nMolmo2-8B,0.1085,/models/molmo2-8b/providers,false\nNVIDIA Nemotron Nano 9B V2,0.10733333333333334,/models/nvidia-nemotron-nano-9b-v2-reasoning/providers,false\nNVIDIA Nemotron Nano 12B v2 VL,0.107,/models/nvidia-nemotron-nano-12b-v2-vl/providers,false\nQwen3 8B,0.10283333333333333,/models/qwen3-8b-instruct/providers,false\nOlmo 3 7B Think,0.10166666666666667,/models/olmo-3-7b-think/providers,false\nQwen3 VL 4B,0.1005,/models/qwen3-vl-4b-instruct/providers,false\nGemma 3 12B,0.0995,/models/gemma-3-12b/providers,false\nEXAONE 4.0 32B,0.09933333333333333,/models/exaone-4-0-32b/providers,false\nLlama 3.2 11B (Vision),0.096,/models/llama-3-2-instruct-11b-vision/providers,false\nNova Micro,0.09366666666666666,/models/nova-micro/providers,false\nNova Lite,0.09316666666666666,/models/nova-lite/providers,false\nLlama 3 8B,0.09283333333333334,/models/llama-3-instruct-8b/providers,false\nNVIDIA Nemotron Nano 9B V2,0.09033333333333333,/models/nvidia-nemotron-nano-9b-v2/providers,false\nGranite 4.0 Micro,0.08716666666666667,/models/granite-4-0-micro/providers,false\nGranite 3.3 8B,0.08416666666666667,/models/granite-3-3-8b-instruct/providers,false\nQwen3 1.7B,0.08383333333333333,/models/qwen3-1.7b-instruct-reasoning/providers,false\nLlama 3.1 8B,0.0795,/models/llama-3-1-instruct-8b/providers,false\nMinistral 3 3B,0.07783333333333334,/models/ministral-3-3b/providers,false\nPhi-4 Mini,0.07733333333333334,/models/phi-4-mini/providers,false\nMistral 7B,0.07716666666666666,/models/mistral-7b-instruct/providers,false\nGemma 3n E4B,0.07416666666666667,/models/gemma-3n-e4b/providers,false\nQwen3 1.7B,0.07333333333333333,/models/qwen3-1.7b-instruct/providers,false\nGemma 3 4B,0.072,/models/gemma-3-4b/providers,false\nLFM2 8B A1B,0.071,/models/lfm2-8b-a1b/providers,false\nOlmo 3 7B,0.06966666666666667,/models/olmo-3-7b-instruct/providers,false\nLlama 3.2 1B,0.06566666666666666,/models/llama-3-2-instruct-1b/providers,false\nJamba Reasoning 3B,0.06533333333333333,/models/jamba-reasoning-3b/providers,false\nLFM2.5-1.2B-Thinking,0.06533333333333333,/models/lfm2-5-1-2b-thinking/providers,false\nLFM2.5-1.2B-Instruct,0.06316666666666666,/models/lfm2-5-1-2b-instruct/providers,false\nGemma 3n E2B,0.06233333333333333,/models/gemma-3n-e2b/providers,false\nExaone 4.0 1.2B,0.061,/models/exaone-4-0-1-2b-reasoning/providers,false\nGranite 4.0 1B,0.058833333333333335,/models/granite-4-0-nano-1b/providers,false\nQwen3 0.6B,0.0555,/models/qwen3-0.6b-instruct-reasoning/providers,false\nLFM2.5-VL-1.6B,0.052333333333333336,/models/lfm2-5-vl-1-6b/providers,false\nLFM2 2.6B,0.050166666666666665,/models/lfm2-2-6b/providers,false\nGranite 4.0 H 1B,0.050166666666666665,/models/granite-4-0-h-nano-1b/providers,false\nExaone 4.0 1.2B,0.04616666666666667,/models/exaone-4-0-1-2b/providers,false\nLFM2 1.2B,0.043333333333333335,/models/lfm2-1-2b/providers,false\nQwen3 0.6B,0.042,/models/qwen3-0.6b-instruct/providers,false\nGranite 4.0 350M,0.0375,/models/granite-4-0-350m/providers,false\nGranite 4.0 H 350M,0.034666666666666665,/models/granite-4-0-h-350m/providers,false\nGemma 3 1B,0.033666666666666664,/models/gemma-3-1b/providers,false\nJamba 1.7 Large,0.0016666666666666668,/models/jamba-1-7-large/providers,false\nLing-mini-2.0,0.0016666666666666668,/models/ling-mini-2-0/providers,false\nJamba 1.7 Mini,0.0006666666666666666,/models/jamba-1-7-mini/providers,false"}

AA-Omniscience Hallucination Rate

AA-Omniscience Hallucination Rate (lower is better) measures how often the model answers incorrectly when it should have refused or admitted to not knowing the answer. It is defined as the proportion of incorrect answers out of all non-correct responses, i.e. incorrect / (incorrect + partial answers + not attempted).

{"@context":"https://schema.org","@type":"Dataset","name":"AA-Omniscience Hallucination Rate","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"AA-Omniscience Hallucination Rate (lower is better) measures how often the model answers incorrectly when it should have refused or admitted to not knowing the answer. It is defined as the proportion of incorrect answers out of all non-correct responses, i.e. incorrect / (incorrect + partial answers + not attempted).","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"modelName,omniscienceHallucinationRate,detailsUrl,isLabClaimedValue\nJamba 1.7 Large,1,/models/jamba-1-7-large/providers,false\nGemma 3 4B,0.9807830459770115,/models/gemma-3-4b/providers,false\nJamba 1.7 Mini,0.9714285714285714,/models/jamba-1-7-mini/providers,false\nQwen3 Omni 30B A3B,0.9700193423597679,/models/qwen3-omni-30b-a3b-instruct/providers,false\nGemma 3 12B,0.9683509161576902,/models/gemma-3-12b/providers,false\nQwen3 1.7B,0.9679856115107913,/models/qwen3-1.7b-instruct/providers,false\nQwen3 VL 4B,0.9662775616083009,/models/qwen3-vl-4b-instruct/providers,false\nGemma 3n E4B,0.9656165616561656,/models/gemma-3n-e4b/providers,false\nGranite 4.0 H 350M,0.962707182320442,/models/granite-4-0-h-350m/providers,false\nLFM2.5-1.2B-Thinking,0.9618402282453637,/models/lfm2-5-1-2b-thinking/providers,false\nLFM2.5-VL-1.6B,0.9584945480126627,/models/lfm2-5-vl-1-6b/providers,false\nQwen3 8B,0.955043655953929,/models/qwen3-8b-instruct/providers,false\nGranite 3.3 8B,0.953958143767061,/models/granite-3-3-8b-instruct/providers,false\nGranite 4.0 Micro,0.9538068285557787,/models/granite-4-0-micro/providers,false\nQwen3 0.6B,0.9504175365344467,/models/qwen3-0.6b-instruct/providers,false\nQwen3 1.7B,0.946698199017646,/models/qwen3-1.7b-instruct-reasoning/providers,false\nQwen3 30B A3B 2507,0.9463243873978997,/models/qwen3-30b-a3b-2507/providers,false\nGranite 4.0 1B,0.9443952541172304,/models/granite-4-0-nano-1b/providers,false\nGLM-4.7-Flash,0.9434716406547392,/models/glm-4-7-flash-non-reasoning/providers,false\nExaone 4.0 1.2B,0.9432019879304224,/models/exaone-4-0-1-2b-reasoning/providers,false\nLing-flash-2.0,0.9410741885625966,/models/ling-flash-2-0/providers,false\nHermes 4 70B,0.9397254397254398,/models/hermes-4-llama-3-1-70b-reasoning/providers,false\nNVIDIA Nemotron Nano 12B v2 VL,0.9391564016424039,/models/nvidia-nemotron-nano-12b-v2-vl/providers,false\nHermes 4 405B,0.9379261697625205,/models/hermes-4-llama-3-1-405b-reasoning/providers,false\nOlmo 3 7B Think,0.9365491651205937,/models/olmo-3-7b-think/providers,false\nQwen3 Max Thinking,0.9349859681945744,/models/qwen3-max-thinking/providers,false\nLing-1T,0.9339622641509434,/models/ling-1t/providers,false\nSolar Pro 2,0.9337152209492635,/models/solar-pro-2-reasoning/providers,false\ngpt-oss-20B (high),0.9320445225541887,/models/gpt-oss-20b/providers,false\nQwen3 0.6B,0.9317098994176813,/models/qwen3-0.6b-instruct-reasoning/providers,false\nGLM-4.6,0.9308879445314248,/models/glm-4-6-reasoning/providers,false\nNova 2.0 Omni (medium),0.928743961352657,/models/nova-2-0-omni-reasoning-medium/providers,false\nMinistral 3 8B,0.9276129276129276,/models/ministral-3-8b/providers,false\nQwen3 Next 80B A3B,0.9269561737042226,/models/qwen3-next-80b-a3b-instruct/providers,false\nGemma 3n E2B,0.9262353359402773,/models/gemma-3n-e2b/providers,false\nGemini 2.5 Flash,0.9256530475552579,/models/gemini-2-5-flash/providers,false\nMiMo-V2-Flash,0.9253393665158371,/models/mimo-v2-flash-reasoning/providers,false\nDeepSeek V3.2,0.9251186879585671,/models/deepseek-v3-2/providers,false\nLing-mini-2.0,0.925,/models/ling-mini-2-0/providers,false\nQwen3 VL 30B A3B,0.9241446725317694,/models/qwen3-vl-30b-a3b-instruct/providers,false\nGLM-4.7,0.923027027027027,/models/glm-4-7-non-reasoning/providers,false\nGPT-4.1,0.9228861330326945,/models/gpt-4-1/providers,false\nGLM-4.5-Air,0.9205414949970571,/models/glm-4-5-air/providers,false\nQwen3 14B,0.9205020920502092,/models/qwen3-14b-instruct/providers,false\nExaone 4.0 1.2B,0.920321509697711,/models/exaone-4-0-1-2b/providers,false\nNVIDIA Nemotron Nano 12B v2 VL,0.9197922677437969,/models/nvidia-nemotron-nano-12b-v2-vl-reasoning/providers,false\nSolar Pro 2,0.9197724597881523,/models/solar-pro-2/providers,false\nQwen3 VL 4B,0.9195856873822975,/models/qwen3-vl-4b-reasoning/providers,false\nApriel-v1.6-15B-Thinker,0.916466826538769,/models/apriel-v1-6-15b-thinker/providers,false\nGPT-4.1 mini,0.9159146841673503,/models/gpt-4-1-mini/providers,false\nMi:dm K 2.5 Pro,0.9154668860551214,/models/mi-dm-k-2-5-pro-dec28/providers,false\nOlmo 3 7B,0.915263346470799,/models/olmo-3-7b-instruct/providers,false\nMi:dm K 2.5 Pro Preview,0.9147064839435467,/models/midm-250-pro-rsnsft/providers,false\nDeepSeek V3.2 Exp,0.9133875106928999,/models/deepseek-v3-2-0925/providers,false\nGemma 3 27B,0.9110563246728618,/models/gemma-3-27b/providers,false\nLFM2 8B A1B,0.9108360243989954,/models/lfm2-8b-a1b/providers,false\nGemini 3 Flash,0.9094922737306843,/models/gemini-3-flash/providers,false\nNova 2.0 Lite (medium),0.9076240419524002,/models/nova-2-0-lite-reasoning-medium/providers,false\nMagistral Small 1.2,0.9073366450133741,/models/magistral-small-2509/providers,false\nQwen3 VL 32B,0.9069226294357184,/models/qwen3-vl-32b-instruct/providers,false\nGemini 3 Flash,0.9065096952908587,/models/gemini-3-flash-reasoning/providers,false\ngpt-oss-120B (low),0.9051109753614335,/models/gpt-oss-120b-low/providers,false\nMinistral 3 14B,0.904969650986343,/models/ministral-3-14b/providers,false\nQwen3 Max Thinking (Preview),0.9048376107199637,/models/qwen3-max-thinking-preview/providers,false\nQwen3 VL 8B,0.9047521086196256,/models/qwen3-vl-8b-reasoning/providers,false\nQwen3 VL 235B A22B,0.904683309263462,/models/qwen3-vl-235b-a22b-instruct/providers,false\nGLM-4.7,0.904052165812762,/models/glm-4-7/providers,false\nGemini 2.5 Flash (Sep),0.9036820835204311,/models/gemini-2-5-flash-preview-09-2025/providers,false\nQwen3 8B,0.9035523300229182,/models/qwen3-8b-instruct-reasoning/providers,false\nReka Flash 3,0.903244384718756,/models/reka-flash-3/providers,false\nQwen3 VL 8B,0.902680412371134,/models/qwen3-vl-8b-instruct/providers,false\nK-EXAONE,0.9016424751718869,/models/k-exaone-non-reasoning/providers,false\nNova 2.0 Pro Preview (medium),0.9013282732447818,/models/nova-2-0-pro-reasoning-medium/providers,false\nMolmo2-8B,0.9005421574126005,/models/molmo2-8b/providers,false\ngpt-oss-120B (high),0.899562408835174,/models/gpt-oss-120b/providers,false\nNVIDIA Nemotron 3 Nano,0.8981233243967829,/models/nvidia-nemotron-3-nano-30b-a3b/providers,false\nGLM-4.5V,0.8968158000806127,/models/glm-4-5v/providers,false\nQwen3 235B A22B 2507,0.8964262786218703,/models/qwen3-235b-a22b-instruct-2507-reasoning/providers,false\nK2-V2 (high),0.8949799196787148,/models/k2-v2/providers,false\nQwen3 VL 30B A3B,0.8929421094369548,/models/qwen3-vl-30b-a3b-reasoning/providers,false\nGLM-4.7-Flash,0.8928009441384737,/models/glm-4-7-flash/providers,false\nLlama 3 8B,0.892155061546941,/models/llama-3-instruct-8b/providers,false\nLlama 3.3 70B,0.8920892494929006,/models/llama-3-3-instruct-70b/providers,false\nGPT-5.1,0.891735918744229,/models/gpt-5-1-non-reasoning/providers,false\nDeepSeek V3 (Dec),0.8915715539947322,/models/deepseek-v3/providers,false\nRing-flash-2.0,0.8907297830374753,/models/ring-flash-2-0/providers,false\nMiniMax M1 80k,0.8904741921947126,/models/minimax-m1-80k/providers,false\nQwen3 Max,0.8904109589041096,/models/qwen3-max/providers,false\nQwen3 30B,0.890285071267817,/models/qwen3-30b-a3b-instruct/providers,false\nMotif-2-12.7B,0.8889967009509024,/models/motif-2-12-7b/providers,false\nMiniMax-M2,0.8888421052631579,/models/minimax-m2/providers,false\nDeepSeek R1 (Jan),0.8879208853308218,/models/deepseek-r1-0120/providers,false\nGemini 2.5 Pro,0.8866968808317782,/models/gemini-2-5-pro/providers,false\nNova 2.0 Omni,0.8858664637626023,/models/nova-2-0-omni/providers,false\nDeepSeek V3.2 Speciale,0.8851119894598155,/models/deepseek-v3-2-speciale/providers,false\nGemini 2.5 Flash (Sep),0.883131705090162,/models/gemini-2-5-flash-preview-09-2025-reasoning/providers,false\nQwen3 Omni 30B A3B,0.8824670287044221,/models/qwen3-omni-30b-a3b-reasoning/providers,false\nK-EXAONE,0.8820391227030231,/models/k-exaone/providers,false\nSolar Open 100B,0.8815023097007432,/models/solar-open-100b-reasoning/providers,false\nGemini 3 Pro Preview (high),0.8798993167925206,/models/gemini-3-pro/providers,false\nRing-1T,0.879114055061064,/models/ring-1t/providers,false\nGPT-5 (minimal),0.8777192580719029,/models/gpt-5-minimal/providers,false\nQwen3 Max (Preview),0.8770974068424493,/models/qwen3-max-preview/providers,false\nGPT-5 nano (minimal),0.8770214366303122,/models/gpt-5-nano-minimal/providers,false\nGPT-5 mini (minimal),0.8769601930036188,/models/gpt-5-mini-minimal/providers,false\nLlama 4 Maverick,0.8757899324471562,/models/llama-4-maverick/providers,false\nFalcon-H1R-7B,0.8740354938271605,/models/falcon-h1r-7b/providers,false\nGranite 4.0 H Small,0.8725207009435779,/models/granite-4-0-h-small/providers,false\nGemini 2.5 Flash-Lite,0.872211350293542,/models/gemini-2-5-flash-lite/providers,false\nGemini 3 Pro Preview (low),0.8718740351960481,/models/gemini-3-pro-low/providers,false\nLlama 3 70B,0.8706237424547284,/models/llama-3-instruct-70b/providers,false\nDevstral Small 2,0.8688492452460302,/models/devstral-small-2/providers,false\nDeepSeek V3.1 Terminus,0.8684040491061813,/models/deepseek-v3-1-terminus/providers,false\nQwen3 Next 80B A3B,0.8681475443244345,/models/qwen3-next-80b-a3b-reasoning/providers,false\no3,0.8674634794156707,/models/o3/providers,false\nNova 2.0 Pro Preview (low),0.8666947901286648,/models/nova-2-0-pro-reasoning-low/providers,false\nINTELLECT-3,0.8666802776643528,/models/intellect-3/providers,false\nQwen3 235B,0.8661829907295445,/models/qwen3-235b-a22b-instruct/providers,false\nGemini 2.5 Flash-Lite (Sep),0.8660498793242156,/models/gemini-2-5-flash-lite-preview-09-2025-reasoning/providers,false\nLFM2.5-1.2B-Instruct,0.8653264543675503,/models/lfm2-5-1-2b-instruct/providers,false\nGemma 3 1B,0.8652983787512936,/models/gemma-3-1b/providers,false\nGrok 4 Fast,0.864098982239074,/models/grok-4-fast/providers,false\nOlmo 3 32B Think,0.8630110443712459,/models/olmo-3-32b-think/providers,false\nDeepSeek R1 0528 Qwen3 8B,0.8627230046948356,/models/deepseek-r1-qwen3-8b/providers,false\nEXAONE 4.0 32B,0.8625,/models/exaone-4-0-32b-reasoning/providers,false\nQwen3 30B A3B 2507,0.8623817034700315,/models/qwen3-30b-a3b-2507-reasoning/providers,false\nGrok 3,0.8623221661312529,/models/grok-3/providers,false\nNova 2.0 Lite (low),0.8612612612612612,/models/nova-2-0-lite-reasoning-low/providers,false\ngpt-oss-20B (low),0.8605908476539873,/models/gpt-oss-20b-low/providers,false\nDevstral Small (May),0.8582908663269306,/models/devstral-small-2505/providers,false\nSeed-OSS-36B-Instruct,0.8572580645161291,/models/seed-oss-36b-instruct/providers,false\nERNIE 5.0 Thinking Preview,0.8533304404426123,/models/ernie-5-0-thinking-preview/providers,false\nGemini 2.0 Flash,0.8531992296169484,/models/gemini-2-0-flash/providers,false\nDeepSeek V3 0324,0.8518438177874187,/models/deepseek-v3-0324/providers,false\nNova 2.0 Lite,0.8506630789928887,/models/nova-2-0-lite/providers,false\nQwen2.5 72B,0.8481139646869984,/models/qwen2-5-72b-instruct/providers,false\nMistral Large 3,0.8473465822231928,/models/mistral-large-3/providers,false\nDeepSeek V3.1,0.8472758472758473,/models/deepseek-v3-1/providers,false\nDevstral 2,0.8443474646716542,/models/devstral-2/providers,false\nQwen3 VL 235B A22B,0.8424470982610518,/models/qwen3-vl-235b-a22b-reasoning/providers,false\nNova 2.0 Omni (low),0.8407294832826747,/models/nova-2-0-omni-reasoning-low/providers,false\nApriel-v1.5-15B-Thinker,0.8400236127508854,/models/apriel-v1-5-15b-thinker/providers,false\nGranite 4.0 H 1B,0.8359361291454641,/models/granite-4-0-h-nano-1b/providers,false\nDeepSeek R1 0528,0.8336082960169692,/models/deepseek-r1/providers,false\nGLM-4.5V,0.8332637729549248,/models/glm-4-5v-reasoning/providers,false\nQwen3 VL 32B,0.8322040653646872,/models/qwen3-vl-32b-reasoning/providers,false\nHermes 4 70B,0.8259634888438134,/models/hermes-4-llama-3-1-70b/providers,false\nNVIDIA Nemotron 3 Nano,0.8253144340187663,/models/nvidia-nemotron-3-nano-30b-a3b-reasoning/providers,false\nMistral Medium 3.1,0.824799506477483,/models/mistral-medium-3-1/providers,false\nEXAONE 4.0 32B,0.8242042931162102,/models/exaone-4-0-32b/providers,false\nGPT-4.1 nano,0.8224727689661762,/models/gpt-4-1-nano/providers,false\nDeepSeek V3.2,0.8192771084337349,/models/deepseek-v3-2-reasoning/providers,false\nStep3 VL 10B,0.8187689969604863,/models/step3-vl-10b/providers,false\nGPT-5 (medium),0.8163428267234496,/models/gpt-5-medium/providers,false\nDeepSeek V3.1,0.816179879462216,/models/deepseek-v3-1-reasoning/providers,false\nQwen3 32B,0.8143712574850299,/models/qwen3-32b-instruct-reasoning/providers,false\nGPT-5 (high),0.8099375509095845,/models/gpt-5/providers,false\nLlama Nemotron Ultra,0.8094059405940595,/models/llama-3-1-nemotron-ultra-253b-v1-reasoning/providers,false\nDeepSeek R1 Distill Llama 70B,0.808997955010225,/models/deepseek-r1-distill-llama-70b/providers,false\nGrok 4.1 Fast,0.8089865399841647,/models/grok-4-1-fast/providers,false\no3-mini (high),0.8073068893528184,/models/o3-mini-high/providers,false\nDeepSeek V3.2 Exp,0.8060246462802373,/models/deepseek-v3-2-reasoning-0925/providers,false\nK2-V2 (medium),0.8050339592489013,/models/k2-v2-medium/providers,false\nLlama 3.2 11B (Vision),0.8027286135693216,/models/llama-3-2-instruct-11b-vision/providers,false\nMistral 7B,0.8015170670037927,/models/mistral-7b-instruct/providers,false\nHyperCLOVA X SEED Think (32B),0.8011060635986569,/models/hyperclova-x-seed-think-32b/providers,false\nQwen3 30B,0.7995668438669029,/models/qwen3-30b-a3b-instruct-reasoning/providers,false\nHermes 4 405B,0.7939151676660005,/models/hermes-4-llama-3-1-405b/providers,false\nPhi-4,0.7936447166921899,/models/phi-4/providers,false\no4-mini (high),0.7897368993259404,/models/o4-mini/providers,false\nLFM2 1.2B,0.7897212543554007,/models/lfm2-1-2b/providers,false\nCogito v2.1,0.7883040935672515,/models/cogito-v2-1-reasoning/providers,false\nQwen3 Coder 30B A3B,0.7875098193244304,/models/qwen3-coder-30b-a3b-instruct/providers,false\nNova 2.0 Pro Preview,0.7872424722662441,/models/nova-2-0-pro/providers,false\nLlama 4 Scout,0.7869235259778167,/models/llama-4-scout/providers,false\nGrok Code Fast 1,0.786839266450917,/models/grok-code-fast-1/providers,false\nDoubao Seed Code,0.7854640980735552,/models/doubao-seed-code/providers,false\nLlama 3.1 70B,0.7799917830731307,/models/llama-3-1-instruct-70b/providers,false\nGPT-5.2 (xhigh),0.7786302926967889,/models/gpt-5-2/providers,false\nMinistral 3 3B,0.7780589192120008,/models/ministral-3-3b/providers,false\nMistral Small 3.1,0.7761135965765416,/models/mistral-small-3-1/providers,false\nNova Pro,0.7754532775453278,/models/nova-pro/providers,false\nQwen3 4B 2507,0.7753969772335948,/models/qwen3-4b-2507-instruct-reasoning/providers,false\nGPT-5 (low),0.7742864624247185,/models/gpt-5-low/providers,false\nGemini 2.5 Flash-Lite,0.7737329042638778,/models/gemini-2-5-flash-lite-reasoning/providers,false\nQwen3 235B,0.7678137651821862,/models/qwen3-235b-a22b-instruct-reasoning/providers,false\nDevstral Small,0.7660275033895022,/models/devstral-small/providers,false\nMistral Small 3.2,0.7647744945567652,/models/mistral-small-3-2/providers,false\nQwen3 235B 2507,0.7640040444893832,/models/qwen3-235b-a22b-instruct-2507/providers,false\nPhi-4 Mini,0.7633670520231214,/models/phi-4-mini/providers,false\nCommand A,0.7611852433281004,/models/command-a/providers,false\nMagistral Small 1,0.7582240161453078,/models/magistral-small/providers,false\nLlama Nemotron Super 49B v1.5,0.7572989076464747,/models/llama-nemotron-super-49b-v1-5-reasoning/providers,false\nK2-V2 (low),0.7558915946582875,/models/k2-v2-low/providers,false\nKimi K2,0.7535938903863432,/models/kimi-k2/providers,false\nQwen3 14B,0.7509221510386332,/models/qwen3-14b-instruct-reasoning/providers,false\nQwen3 4B 2507,0.7454614220877458,/models/qwen3-4b-2507-instruct/providers,false\nKimi K2 Thinking,0.7439943476212906,/models/kimi-k2-thinking/providers,false\nGPT-5 Codex (high),0.7435082140964494,/models/gpt-5-codex/providers,false\nGemini 2.5 Flash,0.7423435419440746,/models/gemini-2-5-flash-reasoning/providers,false\nClaude Opus 4.5,0.7422258592471358,/models/claude-opus-4-5/providers,false\nJamba Reasoning 3B,0.7421540656205421,/models/jamba-reasoning-3b/providers,false\nNVIDIA Nemotron Nano 9B V2,0.7411139611579333,/models/nvidia-nemotron-nano-9b-v2/providers,false\nDeepSeek V3.1 Terminus,0.7395881006864988,/models/deepseek-v3-1-terminus-reasoning/providers,false\nLlama 3.3 Nemotron Super 49B,0.7341626794258374,/models/llama-3-3-nemotron-super-49b/providers,false\nGPT-5.1 Codex (high),0.7312045270816492,/models/gpt-5-1-codex/providers,false\nMiMo-V2-Flash,0.7279426409081856,/models/mimo-v2-flash/providers,false\nOlmo 3.1 32B Instruct,0.725567435753142,/models/olmo-3-1-32b-instruct/providers,false\nGrok 4.1 Fast,0.7174291938997821,/models/grok-4-1-fast-reasoning/providers,false\nGPT-5.2 Codex (xhigh),0.7148659626320065,/models/gpt-5-2-codex/providers,false\nNova Premier,0.707261880271549,/models/nova-premier/providers,false\nGranite 4.0 350M,0.7006060606060606,/models/granite-4-0-350m/providers,false\nGLM-4.5,0.694614711033275,/models/glm-4.5/providers,false\nKimi K2 0905,0.6895568231680561,/models/kimi-k2-0905/providers,false\nLlama 3.1 Nemotron 70B,0.6888933121019108,/models/llama-3-1-nemotron-instruct-70b/providers,false\no1,0.6871884346959123,/models/o1/providers,false\nMistral Large 2 (Nov),0.6751133086114545,/models/mistral-large-2/providers,false\nGrok 4 Fast,0.6737922188969645,/models/grok-4-fast-reasoning/providers,false\nGLM-4.6,0.6711956521739131,/models/glm-4-6/providers,false\nMiniMax-M2.1,0.6655260906757913,/models/minimax-m2-1/providers,false\nERNIE 4.5 300B A47B,0.665314401622718,/models/ernie-4-5-300b-a47b/providers,false\nGLM-4.6V,0.6638,/models/glm-4-6v/providers,false\nLlama 3.2 1B,0.6635747413485551,/models/llama-3-2-instruct-1b/providers,false\nLlama Nemotron Super 49B v1.5,0.6632768361581921,/models/llama-nemotron-super-49b-v1-5/providers,false\nKAT-Coder-Pro V1,0.6623058053965658,/models/kat-coder-pro-v1/providers,false\nGemini 2.5 Flash-Lite (Sep),0.6585881900365455,/models/gemini-2-5-flash-lite-preview-09-2025/providers,false\nNova Micro,0.6478484737035675,/models/nova-micro/providers,false\nKimi K2.5,0.6429101707498144,/models/kimi-k2-5/providers,false\nGrok 4,0.638996138996139,/models/grok-4/providers,false\nLFM2 2.6B,0.6313388313739252,/models/lfm2-2-6b/providers,false\nOlmo 3.1 32B Think,0.6250485436893204,/models/olmo-3-1-32b-think/providers,false\nDevstral Medium,0.6228105906313646,/models/devstral-medium/providers,false\nNVIDIA Nemotron Nano 9B V2,0.6043689320388349,/models/nvidia-nemotron-nano-9b-v2-reasoning/providers,false\nMistral Medium 3,0.6035872632003224,/models/mistral-medium-3/providers,false\nGPT-5.2,0.6016655100624566,/models/gpt-5-2-non-reasoning/providers,false\nMagistral Medium 1.2,0.5970802919708029,/models/magistral-medium-2509/providers,false\nGPT-5.2 (medium),0.5930713547052741,/models/gpt-5-2-medium/providers,false\nMagistral Medium 1,0.5890751086281812,/models/magistral-medium/providers,false\nGPT-5 nano (high),0.5865796451152355,/models/gpt-5-nano/providers,false\nNova Lite,0.5829810696563131,/models/nova-lite/providers,false\nClaude Opus 4.5,0.5780837972458248,/models/claude-opus-4-5-thinking/providers,false\nGPT-5 mini (high),0.5527909995672868,/models/gpt-5-mini/providers,false\nGPT-4o (Aug),0.5386559932589003,/models/gpt-4o-2024-08-06/providers,false\nGPT-5 nano (medium),0.523021726131154,/models/gpt-5-nano-medium/providers,false\nK2 Think V2,0.5203459799488893,/models/k2-think-v2/providers,false\nClaude 3.7 Sonnet,0.5199726089933805,/models/claude-3-7-sonnet/providers,false\nClaude 4.5 Sonnet,0.5143704379562044,/models/claude-4-5-sonnet/providers,false\nLlama 3.1 405B,0.511727078891258,/models/llama-3-1-instruct-405b/providers,false\nGPT-5.1 (high),0.5115919629057187,/models/gpt-5-1/providers,false\nGPT-5.1 Codex mini (high),0.5112862010221465,/models/gpt-5-1-codex-mini/providers,false\nGLM-4.6V,0.48879716981132076,/models/glm-4-6v-reasoning/providers,false\nKimi K2.5,0.4865096359743041,/models/kimi-k2-5-non-reasoning/providers,false\nClaude 4.1 Opus,0.4838709677419355,/models/claude-4-1-opus-thinking/providers,false\nClaude 4.5 Sonnet,0.47732754462132176,/models/claude-4-5-sonnet-thinking/providers,false\nQwen3 Coder 480B,0.44877288663809894,/models/qwen3-coder-480b-a35b-instruct/providers,false\nGPT-5 mini (medium),0.43377063055438003,/models/gpt-5-mini-medium/providers,false\nLlama 3.1 8B,0.4296577946768061,/models/llama-3-1-instruct-8b/providers,false\nClaude 3.5 Haiku,0.41132723112128144,/models/claude-3-5-haiku/providers,false\nClaude 4 Sonnet,0.40504986208359856,/models/claude-4-sonnet/providers,false\nClaude 3.7 Sonnet,0.3891670459717797,/models/claude-3-7-sonnet-thinking/providers,false\nGPT-4o (Nov),0.37766393442622953,/models/gpt-4o/providers,false\nClaude 4 Sonnet,0.28900147772852014,/models/claude-4-sonnet-thinking/providers,false\nClaude 4.5 Haiku,0.2606880095446411,/models/claude-4-5-haiku-reasoning/providers,false\nGrok 3 mini Reasoning (high),0.25321637426900584,/models/grok-3-mini-reasoning/providers,false\nClaude 4.5 Haiku,0.2467757459095284,/models/claude-4-5-haiku/providers,false"}

Detailed Domain Results

Artificial Analysis Omniscience Index Across Domains (Normalized)

Artificial Analysis Omniscience Index; Scores are normalized per domain across all models tested, where green represents the highest score for that domain and red represents the lowest score for that domain.

Reasoning model

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Omniscience Index Across Domains (Normalized)","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Omniscience Index; Scores are normalized per domain across all models tested, where green represents the highest score for that domain and red represents the lowest score for that domain.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"label,omniscience,detailsUrl\nGemini 3 Pro Preview (high),[object Object],/models/gemini-3-pro/providers\nClaude Opus 4.5 (Reasoning),[object Object],/models/claude-opus-4-5-thinking/providers\nGemini 3 Flash Preview (Reasoning),[object Object],/models/gemini-3-flash-reasoning/providers\nClaude 4.1 Opus (Reasoning),[object Object],/models/claude-4-1-opus-thinking/providers\nGrok 4,[object Object],/models/grok-4/providers\nClaude 4.5 Sonnet (Reasoning),[object Object],/models/claude-4-5-sonnet-thinking/providers\nGPT-5.2 (xhigh),[object Object],/models/gpt-5-2/providers\nGPT-5.2 Codex (xhigh),[object Object],/models/gpt-5-2-codex/providers\nKimi K2.5 (Reasoning),[object Object],/models/kimi-k2-5/providers\nLlama 3.1 Instruct 405B,[object Object],/models/llama-3-1-instruct-405b/providers\nDeepSeek V3.2 (Reasoning),[object Object],/models/deepseek-v3-2-reasoning/providers\nKimi K2 Thinking,[object Object],/models/kimi-k2-thinking/providers\nK2 Think V2,[object Object],/models/k2-think-v2/providers\nMiniMax-M2.1,[object Object],/models/minimax-m2-1/providers\nGrok 4.1 Fast (Reasoning),[object Object],/models/grok-4-1-fast-reasoning/providers\nKAT-Coder-Pro V1,[object Object],/models/kat-coder-pro-v1/providers\nGLM-4.7 (Reasoning),[object Object],/models/glm-4-7/providers\nMistral Large 3,[object Object],/models/mistral-large-3/providers\nMiMo-V2-Flash (Reasoning),[object Object],/models/mimo-v2-flash-reasoning/providers\nLlama 4 Maverick,[object Object],/models/llama-4-maverick/providers\nQwen3 235B A22B 2507 (Reasoning),[object Object],/models/qwen3-235b-a22b-instruct-2507-reasoning/providers\nNova 2.0 Pro Preview (medium),[object Object],/models/nova-2-0-pro-reasoning-medium/providers\ngpt-oss-120B (high),[object Object],/models/gpt-oss-120b/providers\nHyperCLOVA X SEED Think (32B),[object Object],/models/hyperclova-x-seed-think-32b/providers\nNVIDIA Nemotron 3 Nano 30B A3B (Reasoning),[object Object],/models/nvidia-nemotron-3-nano-30b-a3b-reasoning/providers\nMi:dm K 2.5 Pro,[object Object],/models/mi-dm-k-2-5-pro-dec28/providers\nK-EXAONE (Reasoning),[object Object],/models/k-exaone/providers\nFalcon-H1R-7B,[object Object],/models/falcon-h1r-7b/providers\ngpt-oss-20B (high),[object Object],/models/gpt-oss-20b/providers"}

Software Engineering Deep Dive

Software Engineering Omniscience Index Across Languages (Normalized)

Software Engineering Omniscience Index; Scores are normalized per language across all models tested, where green represents the highest score for that language and red represents the lowest score for that language.

Reasoning model

{"@context":"https://schema.org","@type":"Dataset","name":"Software Engineering Omniscience Index Across Languages (Normalized)","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Software Engineering Omniscience Index; Scores are normalized per language across all models tested, where green represents the highest score for that language and red represents the lowest score for that language.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"label,omniscience,detailsUrl\nGemini 3 Pro Preview (high),[object Object],/models/gemini-3-pro/providers\nClaude Opus 4.5 (Reasoning),[object Object],/models/claude-opus-4-5-thinking/providers\nGemini 3 Flash Preview (Reasoning),[object Object],/models/gemini-3-flash-reasoning/providers\nClaude 4.1 Opus (Reasoning),[object Object],/models/claude-4-1-opus-thinking/providers\nGrok 4,[object Object],/models/grok-4/providers\nClaude 4.5 Sonnet (Reasoning),[object Object],/models/claude-4-5-sonnet-thinking/providers\nGPT-5.2 (xhigh),[object Object],/models/gpt-5-2/providers\nGPT-5.2 Codex (xhigh),[object Object],/models/gpt-5-2-codex/providers\nKimi K2.5 (Reasoning),[object Object],/models/kimi-k2-5/providers\nLlama 3.1 Instruct 405B,[object Object],/models/llama-3-1-instruct-405b/providers\nDeepSeek V3.2 (Reasoning),[object Object],/models/deepseek-v3-2-reasoning/providers\nKimi K2 Thinking,[object Object],/models/kimi-k2-thinking/providers\nK2 Think V2,[object Object],/models/k2-think-v2/providers\nMiniMax-M2.1,[object Object],/models/minimax-m2-1/providers\nGrok 4.1 Fast (Reasoning),[object Object],/models/grok-4-1-fast-reasoning/providers\nKAT-Coder-Pro V1,[object Object],/models/kat-coder-pro-v1/providers\nGLM-4.7 (Reasoning),[object Object],/models/glm-4-7/providers\nMistral Large 3,[object Object],/models/mistral-large-3/providers\nMiMo-V2-Flash (Reasoning),[object Object],/models/mimo-v2-flash-reasoning/providers\nLlama 4 Maverick,[object Object],/models/llama-4-maverick/providers\nQwen3 235B A22B 2507 (Reasoning),[object Object],/models/qwen3-235b-a22b-instruct-2507-reasoning/providers\nNova 2.0 Pro Preview (medium),[object Object],/models/nova-2-0-pro-reasoning-medium/providers\ngpt-oss-120B (high),[object Object],/models/gpt-oss-120b/providers\nHyperCLOVA X SEED Think (32B),[object Object],/models/hyperclova-x-seed-think-32b/providers\nNVIDIA Nemotron 3 Nano 30B A3B (Reasoning),[object Object],/models/nvidia-nemotron-3-nano-30b-a3b-reasoning/providers\nMi:dm K 2.5 Pro,[object Object],/models/mi-dm-k-2-5-pro-dec28/providers\nK-EXAONE (Reasoning),[object Object],/models/k-exaone/providers\nFalcon-H1R-7B,[object Object],/models/falcon-h1r-7b/providers\ngpt-oss-20B (high),[object Object],/models/gpt-oss-20b/providers"}

Omniscience Index Question Breakdown

Distribution of questions across different domains and subdomains in the Omniscience benchmark

Business

Humanities & Social Sciences

Science, Engineering & Mathematics

Health

Law

Software Engineering (SWE)

Model Size (Open Weights Models Only)

AA-Omniscience Index vs. Total Parameters

AA-Omniscience Index; Size in Parameters (Billions)

Most attractive quadrant

Alibaba

DeepSeek

Kimi

Korea Telecom

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

Naver

NVIDIA

OpenAI

TII UAE

Xiaomi

Z AI

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

AA-Omniscience Accuracy vs. Total Parameters

AA-Omniscience Accuracy; Size in Parameters (Billions)

Most attractive quadrant

Alibaba

DeepSeek

Kimi

Korea Telecom

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

Naver

NVIDIA

OpenAI

TII UAE

Xiaomi

Z AI

AA-Omniscience Accuracy (higher is better) measures the proportion of correctly answered questions out of all questions, regardless of whether the model chooses to answer

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

AA-Omniscience Hallucination Rate vs. Total Parameters

AA-Omniscience Hallucination Rate; Size in Parameters (Billions)

Most attractive quadrant

Alibaba

DeepSeek

Kimi

Korea Telecom

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

Naver

NVIDIA

OpenAI

TII UAE

Xiaomi

Z AI

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

AA-Omniscience Index: Token Usage

Tokens used to run the evaluation

Input tokens

Reasoning tokens

Answer tokens

The total number of tokens used to run the evaluation, including input tokens (prompt), reasoning tokens (for reasoning models), and answer tokens (final response).

AA-Omniscience Index: Cost Breakdown

Cost (USD) to run the evaluation

Input cost

Reasoning cost

Answer cost

The cost to run the evaluation, calculated using the model's input and output token pricing and the number of tokens used.

AA-Omniscience Index: Score vs. Release Date

Most attractive region

Alibaba

Amazon

Anthropic

DeepSeek

Google

Kimi

Korea Telecom

KwaiKAT

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

Naver

NVIDIA

OpenAI

TII UAE

xAI

Xiaomi

Z AI

Example Problems

Explore Evaluations

Artificial Analysis Intelligence Index

A composite benchmark aggregating ten challenging evaluations to provide a holistic measure of AI capabilities across mathematics, science, coding, and reasoning.

GDPval-AA Leaderboard

GDPval-AA is Artificial Analysis' evaluation framework for OpenAI's GDPval dataset. It tests AI models on real-world tasks across 44 occupations and 9 major industries. Models are given shell access and web browsing capabilities in an agentic loop to solve tasks, with ELO ratings derived from blind pairwise comparisons.

AA-Omniscience: Knowledge and Hallucination Benchmark

A benchmark measuring factual recall and hallucination across various economically relevant domains.

Artificial Analysis Openness Index

A composite measure providing an industry standard to communicate model openness for users and developers.

MMLU-Pro Benchmark Leaderboard

An enhanced version of MMLU with 12,000 graduate-level questions across 14 subject areas, featuring ten answer options and deeper reasoning requirements.

Global-MMLU-Lite Benchmark Leaderboard

A lightweight, multilingual version of MMLU, designed to evaluate knowledge and reasoning skills across a diverse range of languages and cultural contexts.

GPQA Diamond Benchmark Leaderboard

The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but skilled non-experts only reach 34% despite web access.

Humanity's Last Exam Benchmark Leaderboard

A frontier-level benchmark with 2,500 expert-vetted questions across mathematics, sciences, and humanities, designed to be the final closed-ended academic evaluation.

LiveCodeBench Benchmark Leaderboard

A contamination-free coding benchmark that continuously harvests fresh competitive programming problems from LeetCode, AtCoder, and CodeForces, evaluating code generation, self-repair, and execution.

SciCode Benchmark Leaderboard

A scientist-curated coding benchmark featuring 338 sub-tasks derived from 80 genuine laboratory problems across 16 scientific disciplines.

MATH-500 Benchmark Leaderboard

A 500-problem subset from the MATH dataset, featuring competition-level mathematics across six domains including algebra, geometry, and number theory.

IFBench Benchmark Leaderboard

A benchmark evaluating precise instruction-following generalization on 58 diverse, verifiable out-of-domain constraints that test models' ability to follow specific output requirements.

AIME 2025 Benchmark Leaderboard

All 30 problems from the 2025 American Invitational Mathematics Examination, testing olympiad-level mathematical reasoning with integer answers from 000-999.

CritPt Benchmark Leaderboard

A benchmark designed to test LLMs on research-level physics reasoning tasks, featuring 71 composite research challenges.

Terminal-Bench Hard Benchmark Leaderboard

An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.

𝜏²-Bench Telecom Benchmark Leaderboard

A dual-control conversational AI benchmark simulating technical support scenarios where both agent and user must coordinate actions to resolve telecom service issues.

Artificial Analysis Long Context Reasoning Benchmark Leaderboard

A challenging benchmark measuring language models' ability to extract, reason about, and synthesize information from long-form documents ranging from 10k to 100k tokens (measured using the cl100k_base tokenizer).

MMMU-Pro Benchmark Leaderboard

An enhanced MMMU benchmark that eliminates shortcuts and guessing strategies to more rigorously test multimodal models across 30 academic disciplines.

AA-Omniscience: Knowledge and Hallucination Benchmark

Background

Methodology

Related links

Highlights

AA-Omniscience Index: Results

AA-Omniscience Index

AA-Omniscience Index vs. Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index

AA-Omniscience Index

AA-Omniscience Accuracy

AA-Omniscience Accuracy

AA-Omniscience Accuracy

AA-Omniscience Hallucination Rate

AA-Omniscience Hallucination Rate

AA-Omniscience Hallucination Rate

Detailed Domain Results

Artificial Analysis Omniscience Index Across Domains (Normalized)

AA-Omniscience Index

Software Engineering Deep Dive

Software Engineering Omniscience Index Across Languages (Normalized)

AA-Omniscience Index

Omniscience Index Question Breakdown

Business

Humanities & Social Sciences

Science, Engineering & Mathematics

Health

Law

Software Engineering (SWE)

AA-Omniscience Index

Model Size (Open Weights Models Only)

AA-Omniscience Index vs. Total Parameters

AA-Omniscience Index

Total Parameters

AA-Omniscience Accuracy vs. Total Parameters

AA-Omniscience Accuracy

Total Parameters

AA-Omniscience Hallucination Rate vs. Total Parameters

AA-Omniscience Hallucination Rate

Total Parameters

AA-Omniscience Index: Token Usage

Evaluation Token Usage

AA-Omniscience Index: Cost Breakdown

Evaluation Cost

AA-Omniscience Index: Score vs. Release Date

Example Problems

Explore Evaluations