AA-Omniscience: Knowledge and Hallucination Benchmark
Background
Methodology
Publication
View on arXivAA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models
Related links
Highlights
- Gemini 3.1 Pro Preview scores the highest on AA-Omniscience with a score of 33, followed by Gemini 3 Pro Preview (high) with a score of 16, and Claude Opus 4.6 (Adaptive Reasoning, Max Effort) with a score of 14
- Gemini 3 Pro Preview (high) scores the highest on AA-Omniscience Accuracy with a score of 56%, followed by Gemini 3.1 Pro Preview with a score of 55%, and Gemini 3 Flash Preview (Reasoning) with a score of 54%
- Claude 4.1 Opus (Reasoning) scores the lowest on AA-Omniscience Hallucination Rate with a score of 0%, followed by Claude 4 Opus (Reasoning) with a score of 0%, and Claude 4.5 Haiku (Non-reasoning) with a score of 25%
AA-Omniscience Index: Results
AA-Omniscience Index
AA-Omniscience Index vs. Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.
AA-Omniscience Accuracy
AA-Omniscience Accuracy
AA-Omniscience Accuracy (higher is better) measures the proportion of correctly answered questions out of all questions, regardless of whether the model chooses to answer
{"@context":"https://schema.org","@type":"Dataset","name":"AA-Omniscience Accuracy","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"AA-Omniscience Accuracy (higher is better) measures the proportion of correctly answered questions out of all questions, regardless of whether the model chooses to answer","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"modelName,omniscienceAccuracy,detailsUrl,isLabClaimedValue\nGemini 3 Pro Preview (high),0.559,/models/gemini-3-pro/providers,false\nGemini 3.1 Pro Preview,0.5525,/models/gemini-3-1-pro-preview/providers,false\nGemini 3 Flash,0.54,/models/gemini-3-flash-reasoning/providers,false\nGPT-5.3 Codex (xhigh),0.5178333333333334,/models/gpt-5-3-codex/providers,false\nGemini 3 Pro Preview (low),0.4706666666666667,/models/gemini-3-pro-low/providers,false\nClaude Opus 4.6 (max),0.4638333333333333,/models/claude-opus-4-6-adaptive/providers,false\nClaude Opus 4.5,0.45716666666666667,/models/claude-opus-4-5-thinking/providers,false\nGemini 3 Flash,0.45516666666666666,/models/gemini-3-flash/providers,false\nClaude Opus 4.6,0.4515,/models/claude-opus-4-6/providers,false\nGPT-5.2 (xhigh),0.438,/models/gpt-5-2/providers,false\nGrok 4,0.41383333333333333,/models/grok-4/providers,false\nClaude Opus 4.5,0.4073333333333333,/models/claude-opus-4-5/providers,false\nGPT-5.2 Codex (xhigh),0.407,/models/gpt-5-2-codex/providers,false\nGPT-5 (high),0.4065,/models/gpt-5/providers,false\nClaude Sonnet 4.6 (max),0.4003333333333333,/models/claude-sonnet-4-6-adaptive/providers,false\nGPT-5.1 Codex (high),0.392,/models/gpt-5-1-codex/providers,false\nGemini 2.5 Pro,0.39,/models/gemini-2-5-pro/providers,false\nGPT-5 (medium),0.38866666666666666,/models/gpt-5-medium/providers,false\nGPT-5 Codex (high),0.38666666666666666,/models/gpt-5-codex/providers,false\no3,0.384,/models/o3/providers,false\nClaude Sonnet 4.6,0.37966666666666665,/models/claude-sonnet-4-6/providers,false\nGPT-5.2 (medium),0.37733333333333335,/models/gpt-5-2-medium/providers,false\nGPT-5.1 (high),0.37583333333333335,/models/gpt-5-1/providers,false\nGPT-5 (low),0.37433333333333335,/models/gpt-5-low/providers,false\nGemini 3.1 Flash-Lite Preview,0.364,/models/gemini-3-1-flash-lite-preview/providers,false\n\"Claude Sonnet 4.6 (Non-reasoning, Low Effort)\",0.36133333333333334,/models/claude-sonnet-4-6-non-reasoning-low-effort/providers,false\no1,0.3468333333333333,/models/o1/providers,false\nKimi K2.5,0.3431666666666667,/models/kimi-k2-5/providers,false\nDeepSeek V3.2,0.33466666666666667,/models/deepseek-v3-2-reasoning/providers,false\nClaude 4.5 Sonnet,0.3238333333333333,/models/claude-4-5-sonnet-thinking/providers,false\nQwen3.5 397B A17B,0.3135,/models/qwen3-5-397b-a17b/providers,false\nDeepSeek R1 0528,0.3095,/models/deepseek-r1/providers,false\nDeepSeek R1 (Jan),0.30716666666666664,/models/deepseek-r1-0120/providers,false\nHermes 4 405B,0.3016666666666667,/models/hermes-4-llama-3-1-405b-reasoning/providers,false\nKimi K2 Thinking,0.3001666666666667,/models/kimi-k2-thinking/providers,false\nQwen3 Max Thinking,0.2991666666666667,/models/qwen3-max-thinking/providers,false\nGPT-5.2,0.29283333333333333,/models/gpt-5-2-non-reasoning/providers,false\nGLM-4.7,0.2926666666666667,/models/glm-4-7/providers,false\nGPT-5.1,0.2906666666666667,/models/gpt-5-1-non-reasoning/providers,false\nGemini 2.5 Flash (Sep),0.289,/models/gemini-2-5-flash-preview-09-2025-reasoning/providers,false\nDeepSeek V3.1,0.2881666666666667,/models/deepseek-v3-1-reasoning/providers,false\nGrok 3,0.2831666666666667,/models/grok-3/providers,false\nClaude 3.7 Sonnet,0.283,/models/claude-3-7-sonnet/providers,false\nDeepSeek V3.2 Exp,0.2825,/models/deepseek-v3-2-reasoning-0925/providers,false\nGPT-5 (minimal),0.2806666666666667,/models/gpt-5-minimal/providers,false\nClaude 3.7 Sonnet,0.2803333333333333,/models/claude-3-7-sonnet-thinking/providers,false\nQwen3 Max Thinking (Preview),0.2795,/models/qwen3-max-thinking-preview/providers,false\nDeepSeek V3.1 Terminus,0.2793333333333333,/models/deepseek-v3-1-terminus-reasoning/providers,false\nClaude 4.5 Sonnet,0.27516666666666667,/models/claude-4-5-sonnet/providers,false\nGLM-4.6,0.27316666666666667,/models/glm-4-6-reasoning/providers,false\nGLM-5,0.26866666666666666,/models/glm-5/providers,false\nKimi K2,0.26816666666666666,/models/kimi-k2/providers,false\nGemini 2.5 Flash (Sep),0.2665,/models/gemini-2-5-flash-preview-09-2025/providers,false\nGemini 2.5 Flash,0.2653333333333333,/models/gemini-2-5-flash/providers,false\nHermes 4 405B,0.26266666666666666,/models/hermes-4-llama-3-1-405b/providers,false\nMiniMax-M2.5,0.262,/models/minimax-m2-5/providers,false\nKimi K2 0905,0.25433333333333336,/models/kimi-k2-0905/providers,false\nDeepSeek V3 (Dec),0.25366666666666665,/models/deepseek-v3/providers,false\nGrok 4.1 Fast,0.25333333333333335,/models/grok-4-1-fast-reasoning/providers,false\nGemini 2.5 Flash,0.25083333333333335,/models/gemini-2-5-flash-reasoning/providers,false\nDoubao Seed Code,0.25066666666666665,/models/doubao-seed-code/providers,false\nGLM-4.5,0.2495,/models/glm-4.5/providers,false\nMiMo-V2-Flash,0.24916666666666668,/models/mimo-v2-flash-reasoning/providers,false\nQwen3.5 122B A10B,0.24733333333333332,/models/qwen3-5-122b-a10b/providers,false\no4-mini (high),0.24633333333333332,/models/o4-mini/providers,false\nQwen3 Max,0.24433333333333335,/models/qwen3-max/providers,false\nQwen3.5 397B A17B,0.24333333333333335,/models/qwen3-5-397b-a17b-non-reasoning/providers,false\nLlama 4 Maverick,0.24316666666666667,/models/llama-4-maverick/providers,false\nQwen3 Max (Preview),0.24283333333333335,/models/qwen3-max-preview/providers,false\nGPT-4.1,0.24183333333333334,/models/gpt-4-1/providers,false\nDeepSeek V3.2,0.24166666666666667,/models/deepseek-v3-2/providers,false\nMistral Large 3,0.24116666666666667,/models/mistral-large-3/providers,false\nGPT-5 mini (high),0.2395,/models/gpt-5-mini/providers,false\nGLM-4.7,0.23916666666666667,/models/glm-4-7-non-reasoning/providers,false\nGrok Code Fast 1,0.23816666666666667,/models/grok-code-fast-1/providers,false\nGPT-4o (Aug),0.23666666666666666,/models/gpt-4o-2024-08-06/providers,false\nGemini 2.0 Flash,0.23583333333333334,/models/gemini-2-0-flash/providers,false\nDeepSeek V3.1 Terminus,0.2345,/models/deepseek-v3-1-terminus/providers,false\nHermes 4 70B,0.23283333333333334,/models/hermes-4-llama-3-1-70b-reasoning/providers,false\nDeepSeek V3.1,0.23133333333333334,/models/deepseek-v3-1/providers,false\nQwen3 235B A22B 2507,0.23083333333333333,/models/qwen3-235b-a22b-instruct-2507-reasoning/providers,false\nKimi K2.5,0.229,/models/kimi-k2-5-non-reasoning/providers,false\nGPT-5.1 Codex mini (high),0.22716666666666666,/models/gpt-5-1-codex-mini/providers,false\nDeepSeek V3.2 Exp,0.227,/models/deepseek-v3-2-0925/providers,false\nGLM-5,0.22683333333333333,/models/glm-5-non-reasoning/providers,false\nGrok 4 Fast,0.22633333333333333,/models/grok-4-fast-reasoning/providers,false\nClaude 4 Sonnet,0.22433333333333333,/models/claude-4-sonnet/providers,false\nClaude 4 Sonnet,0.22333333333333333,/models/claude-4-sonnet-thinking/providers,false\nGPT-5 mini (medium),0.222,/models/gpt-5-mini-medium/providers,false\nDeepSeek V3 0324,0.22166666666666668,/models/deepseek-v3-0324/providers,false\nNova 2.0 Pro Preview (low),0.22133333333333333,/models/nova-2-0-pro-reasoning-low/providers,false\nMiniMax-M2,0.22016666666666668,/models/minimax-m2/providers,false\nNova 2.0 Pro Preview (medium),0.21983333333333333,/models/nova-2-0-pro-reasoning-medium/providers,false\nERNIE 5.0 Thinking Preview,0.21916666666666668,/models/ernie-5-0-thinking-preview/providers,false\ngpt-oss-120B (high),0.21516666666666667,/models/gpt-oss-120b/providers,false\nMiniMax M1 80k,0.21083333333333334,/models/minimax-m1-80k/providers,false\no3-mini (high),0.21033333333333334,/models/o3-mini-high/providers,false\nDevstral 2,0.21016666666666667,/models/devstral-2/providers,false\nQwen3.5 27B,0.20966666666666667,/models/qwen3-5-27b/providers,false\nMagistral Medium 1.2,0.2085,/models/magistral-medium-2509/providers,false\nGLM-4.6,0.20783333333333334,/models/glm-4-6/providers,false\nQwen3 VL 235B A22B,0.20766666666666667,/models/qwen3-vl-235b-a22b-reasoning/providers,false\nMagistral Medium 1,0.20716666666666667,/models/magistral-medium/providers,false\nMercury 2,0.20483333333333334,/models/mercury-2/providers,false\nMiniMax-M2.1,0.20466666666666666,/models/minimax-m2-1/providers,false\nQwen3.5 35B A3B,0.2045,/models/qwen3-5-35b-a3b/providers,false\nMiMo-V2-Flash (Feb 2026),0.20166666666666666,/models/mimo-v2-0206/providers,false\nQwen3 VL 235B A22B,0.20166666666666666,/models/qwen3-vl-235b-a22b-instruct/providers,false\nQwen3 VL 8B,0.2015,/models/qwen3-vl-8b-instruct/providers,false\nMistral Large 2 (Nov),0.20133333333333334,/models/mistral-large-2/providers,false\nQwen3 VL 8B,0.20083333333333334,/models/qwen3-vl-8b-reasoning/providers,false\nLlama Nemotron Ultra,0.199,/models/llama-3-1-nemotron-ultra-253b-v1-reasoning/providers,false\nLing-1T,0.1985,/models/ling-1t/providers,false\nRing-1T,0.198,/models/ring-1t/providers,false\nMistral Medium 3.1,0.19766666666666666,/models/mistral-medium-3-1/providers,false\nGPT-4o (Nov),0.19683333333333333,/models/gpt-4o/providers,false\nSolar Pro 2,0.19483333333333333,/models/solar-pro-2-reasoning/providers,false\nLlama 3.1 70B,0.19383333333333333,/models/llama-3-1-instruct-70b/providers,false\nINTELLECT-3,0.19233333333333333,/models/intellect-3/providers,false\nNova Premier,0.19116666666666668,/models/nova-premier/providers,false\nDevstral Medium,0.1905,/models/devstral-medium/providers,false\nERNIE 4.5 300B A47B,0.18683333333333332,/models/ernie-4-5-300b-a47b/providers,false\nQwen3.5 122B A10B,0.186,/models/qwen3-5-122b-a10b-non-reasoning/providers,false\nNova 2.0 Omni (low),0.18466666666666667,/models/nova-2-0-omni-reasoning-low/providers,false\nQwen3 235B,0.184,/models/qwen3-235b-a22b-instruct-reasoning/providers,false\nMistral Medium 3,0.183,/models/mistral-medium-3/providers,false\nGPT-5 nano (high),0.18283333333333332,/models/gpt-5-nano/providers,false\nNova 2.0 Lite (medium),0.18283333333333332,/models/nova-2-0-lite-reasoning-medium/providers,false\nK2-V2 (high),0.18283333333333332,/models/k2-v2/providers,false\nQwen3 235B,0.182,/models/qwen3-235b-a22b-instruct/providers,false\nHermes 4 70B,0.18183333333333335,/models/hermes-4-llama-3-1-70b/providers,false\nGPT-5 mini (minimal),0.18183333333333335,/models/gpt-5-mini-minimal/providers,false\nSolar Open 100B,0.18083333333333335,/models/solar-open-100b-reasoning/providers,false\nGemini 2.5 Flash-Lite (Sep),0.18033333333333335,/models/gemini-2-5-flash-lite-preview-09-2025-reasoning/providers,false\nLlama 3.3 70B,0.17933333333333334,/models/llama-3-3-instruct-70b/providers,false\nGLM-4.5V,0.179,/models/glm-4-5v/providers,false\nNova 2.0 Omni (medium),0.17883333333333334,/models/nova-2-0-omni-reasoning-medium/providers,false\nSeed-OSS-36B-Instruct,0.1775,/models/seed-oss-36b-instruct/providers,false\nQwen3 Next 80B A3B,0.17633333333333334,/models/qwen3-next-80b-a3b-instruct/providers,false\nGemini 2.5 Flash-Lite,0.1755,/models/gemini-2-5-flash-lite-reasoning/providers,false\nGLM-4.6V,0.175,/models/glm-4-6v/providers,false\nGPT-4.1 mini,0.175,/models/gpt-4-1-mini/providers,false\nKAT-Coder-Pro V1,0.17483333333333334,/models/kat-coder-pro-v1/providers,false\nClaude 4.5 Haiku,0.17433333333333334,/models/claude-4-5-haiku-reasoning/providers,false\nK2-V2 (medium),0.17366666666666666,/models/k2-v2-medium/providers,false\nQwen3 Next 80B A3B,0.17366666666666666,/models/qwen3-next-80b-a3b-reasoning/providers,false\nQwen3 32B,0.17266666666666666,/models/qwen3-32b-instruct-reasoning/providers,false\nClaude 3 Haiku,0.17166666666666666,/models/claude-3-haiku/providers,false\nNVIDIA Nemotron 3 Nano,0.17116666666666666,/models/nvidia-nemotron-3-nano-30b-a3b-reasoning/providers,false\nNova Pro,0.17033333333333334,/models/nova-pro/providers,false\nGrok 4.1 Fast,0.16966666666666666,/models/grok-4-1-fast/providers,false\nApriel-v1.6-15B-Thinker,0.1695,/models/apriel-v1-6-15b-thinker/providers,false\nLlama Nemotron Super 49B v1.5,0.16933333333333334,/models/llama-nemotron-super-49b-v1-5-reasoning/providers,false\nQwen3 VL 30B A3B,0.1685,/models/qwen3-vl-30b-a3b-reasoning/providers,false\nQwen3 235B 2507,0.16783333333333333,/models/qwen3-235b-a22b-instruct-2507/providers,false\nQwen3 VL 32B,0.1675,/models/qwen3-vl-32b-reasoning/providers,false\nMi:dm K 2.5 Pro,0.1675,/models/mi-dm-k-2-5-pro-dec28/providers,false\nK-EXAONE,0.16483333333333333,/models/k-exaone/providers,false\nLlama 3.1 Nemotron 70B,0.1645,/models/llama-3-1-nemotron-instruct-70b/providers,false\nRing-flash-2.0,0.163,/models/ring-flash-2-0/providers,false\nQwen3 30B A3B 2507,0.16233333333333333,/models/qwen3-30b-a3b-2507-reasoning/providers,false\nQwen3 30B,0.16133333333333333,/models/qwen3-30b-a3b-instruct-reasoning/providers,false\nGPT-5 nano (medium),0.161,/models/gpt-5-nano-medium/providers,false\nNova 2.0 Pro Preview,0.16083333333333333,/models/nova-2-0-pro/providers,false\nGrok 4 Fast,0.15916666666666668,/models/grok-4-fast/providers,false\nGLM-4.7-Flash,0.159,/models/glm-4-7-flash/providers,false\nNova 2.0 Lite (low),0.15866666666666668,/models/nova-2-0-lite-reasoning-low/providers,false\nDevstral Small (May),0.1585,/models/devstral-small-2505/providers,false\nQwen3 Coder 30B A3B,0.15833333333333333,/models/qwen3-coder-30b-a3b-instruct/providers,false\nQwen3 Coder Next,0.15816666666666668,/models/qwen3-coder-next/providers,false\nQwen3.5 27B,0.15733333333333333,/models/qwen3-5-27b-non-reasoning/providers,false\nK2 Think V2,0.15716666666666668,/models/k2-think-v2/providers,false\nCommand A,0.15716666666666668,/models/command-a/providers,false\ngpt-oss-120B (low),0.15716666666666668,/models/gpt-oss-120b-low/providers,false\nQwen3.5 35B A3B,0.15666666666666668,/models/qwen3-5-35b-a3b-non-reasoning/providers,false\nGLM-4.6V,0.15616666666666668,/models/glm-4-6v-reasoning/providers,false\nSolar Pro 2,0.15566666666666668,/models/solar-pro-2/providers,false\ngpt-oss-20B (high),0.15533333333333332,/models/gpt-oss-20b/providers,false\nQwen3 VL 30B A3B,0.15516666666666667,/models/qwen3-vl-30b-a3b-instruct/providers,false\nGLM-4.5-Air,0.155,/models/glm-4-5-air/providers,false\nGrok 3 mini Reasoning (high),0.154,/models/grok-3-mini-reasoning/providers,false\nMotif-2-12.7B,0.1535,/models/motif-2-12-7b/providers,false\nDevstral Small 2,0.15333333333333332,/models/devstral-small-2/providers,false\nMiMo-V2-Flash,0.15216666666666667,/models/mimo-v2-flash/providers,false\nQwen3 30B A3B 2507,0.151,/models/qwen3-30b-a3b-2507/providers,false\nMistral Small 3.1,0.15,/models/mistral-small-3-1/providers,false\nQwen3 14B,0.1495,/models/qwen3-14b-instruct-reasoning/providers,false\nQwen3 VL 32B,0.14866666666666667,/models/qwen3-vl-32b-instruct/providers,false\nQwen3 Omni 30B A3B,0.14816666666666667,/models/qwen3-omni-30b-a3b-reasoning/providers,false\nQwen3 Coder 480B,0.14716666666666667,/models/qwen3-coder-480b-a35b-instruct/providers,false\nMistral Small 3.2,0.14666666666666667,/models/mistral-small-3-2/providers,false\nDevstral Small,0.14666666666666667,/models/devstral-small/providers,false\nQwen3.5 9B,0.14666666666666667,/models/qwen3-5-9b/providers,false\nHyperCLOVA X SEED Think (32B),0.14633333333333334,/models/hyperclova-x-seed-think-32b/providers,false\nLlama 4 Scout,0.1455,/models/llama-4-scout/providers,false\nGemini 2.5 Flash-Lite (Sep),0.14283333333333334,/models/gemini-2-5-flash-lite-preview-09-2025/providers,false\ngpt-oss-20B (low),0.14183333333333334,/models/gpt-oss-20b-low/providers,false\nFalcon-H1R-7B,0.14083333333333334,/models/falcon-h1r-7b/providers,false\nLing-flash-2.0,0.14,/models/ling-flash-2-0/providers,false\nClaude 4.5 Haiku,0.13666666666666666,/models/claude-4-5-haiku/providers,false\nOlmo 3.1 32B Think,0.136,/models/olmo-3-1-32b-think/providers,false\nEXAONE 4.0 32B,0.136,/models/exaone-4-0-32b-reasoning/providers,false\nNova 2.0 Lite,0.135,/models/nova-2-0-lite/providers,false\nMagistral Small 1.2,0.13483333333333333,/models/magistral-small-2509/providers,false\nReka Flash 3,0.13366666666666666,/models/reka-flash-3/providers,false\nPhi-4,0.13183333333333333,/models/phi-4/providers,false\nClaude 3.5 Haiku,0.12966666666666668,/models/claude-3-5-haiku/providers,false\nNova 2.0 Omni,0.12866666666666668,/models/nova-2-0-omni/providers,false\nQwen3.5 4B,0.12783333333333333,/models/qwen3-5-4b/providers,false\nQwen3 14B,0.12733333333333333,/models/qwen3-14b-instruct/providers,false\nStep3 VL 10B,0.12716666666666668,/models/step3-vl-10b/providers,false\nQwen3 4B 2507,0.1265,/models/qwen3-4b-2507-instruct-reasoning/providers,false\nGemma 3 27B,0.12483333333333334,/models/gemma-3-27b/providers,false\nMinistral 3 14B,0.123,/models/ministral-3-14b/providers,false\nLlama Nemotron Super 49B v1.5,0.122,/models/llama-nemotron-super-49b-v1-5/providers,false\nK-EXAONE,0.12183333333333334,/models/k-exaone-non-reasoning/providers,false\nQwen3 VL 4B,0.12166666666666667,/models/qwen3-vl-4b-reasoning/providers,false\nGLM-4.7-Flash,0.11983333333333333,/models/glm-4-7-flash-non-reasoning/providers,false\nTri-21B-Think,0.11966666666666667,/models/tri-21b-think-v0-5/providers,false\nQwen3 30B,0.1165,/models/qwen3-30b-a3b-instruct/providers,false\nJamba 1.7 Mini,0.114,/models/jamba-1-7-mini/providers,false\nNVIDIA Nemotron 3 Nano,0.11366666666666667,/models/nvidia-nemotron-3-nano-30b-a3b/providers,false\nOlmo 3.1 32B Instruct,0.11333333333333333,/models/olmo-3-1-32b-instruct/providers,false\nNVIDIA Nemotron Nano 9B V2,0.11266666666666666,/models/nvidia-nemotron-nano-9b-v2-reasoning/providers,false\nEXAONE 4.0 32B,0.1035,/models/exaone-4-0-32b/providers,false\nQwen3 VL 4B,0.1035,/models/qwen3-vl-4b-instruct/providers,false\nQwen3 4B 2507,0.10316666666666667,/models/qwen3-4b-2507-instruct/providers,false\nLlama 3.2 11B (Vision),0.102,/models/llama-3-2-instruct-11b-vision/providers,false\nNova Lite,0.09733333333333333,/models/nova-lite/providers,false\nGranite 4.0 Micro,0.09016666666666667,/models/granite-4-0-micro/providers,false\nTri-21B-think Preview,0.08733333333333333,/models/tri-21b-think-preview/providers,false\nLing-mini-2.0,0.087,/models/ling-mini-2-0/providers,false\nGranite 3.3 8B,0.08533333333333333,/models/granite-3-3-8b-instruct/providers,false\nLlama 3.1 8B,0.08066666666666666,/models/llama-3-1-instruct-8b/providers,false\nGemma 3 4B,0.078,/models/gemma-3-4b/providers,false\nGemma 3n E4B,0.0775,/models/gemma-3n-e4b/providers,false\nLlama 3.2 1B,0.06933333333333333,/models/llama-3-2-instruct-1b/providers,false\nLFM2 8B A1B,0.06833333333333333,/models/lfm2-8b-a1b/providers,false\nGemma 3n E2B,0.0665,/models/gemma-3n-e2b/providers,false\nLFM2.5-1.2B-Thinking,0.06616666666666667,/models/lfm2-5-1-2b-thinking/providers,false\nJamba Reasoning 3B,0.066,/models/jamba-reasoning-3b/providers,false\nQwen3 1.7B,0.066,/models/qwen3-1.7b-instruct/providers,false\nLFM2 24B A2B,0.06416666666666666,/models/lfm2-24b-a2b/providers,false\nGranite 4.0 1B,0.0605,/models/granite-4-0-nano-1b/providers,false\nLFM2.5-1.2B-Instruct,0.059833333333333336,/models/lfm2-5-1-2b-instruct/providers,false\nQwen3 0.6B,0.05516666666666667,/models/qwen3-0.6b-instruct-reasoning/providers,false\nLFM2 2.6B,0.052333333333333336,/models/lfm2-2-6b/providers,false\nLFM2.5-VL-1.6B,0.052,/models/lfm2-5-vl-1-6b/providers,false\nLFM2 1.2B,0.042833333333333334,/models/lfm2-1-2b/providers,false\nQwen3 0.6B,0.042166666666666665,/models/qwen3-0.6b-instruct/providers,false\nGranite 4.0 H 350M,0.03666666666666667,/models/granite-4-0-h-350m/providers,false\nGemma 3 1B,0.035833333333333335,/models/gemma-3-1b/providers,false\nQwen3.5 0.8B,0.010833333333333334,/models/qwen3-5-0-8b/providers,false\nGemma 3 270M,0.008666666666666666,/models/gemma-3-270m/providers,false"}
AA-Omniscience Hallucination Rate
AA-Omniscience Hallucination Rate
AA-Omniscience Hallucination Rate (lower is better) measures how often the model answers incorrectly when it should have refused or admitted to not knowing the answer. It is defined as the proportion of incorrect answers out of all non-correct responses, i.e. incorrect / (incorrect + partial answers + not attempted).
{"@context":"https://schema.org","@type":"Dataset","name":"AA-Omniscience Hallucination Rate","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"AA-Omniscience Hallucination Rate (lower is better) measures how often the model answers incorrectly when it should have refused or admitted to not knowing the answer. It is defined as the proportion of incorrect answers out of all non-correct responses, i.e. incorrect / (incorrect + partial answers + not attempted).","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"modelName,omniscienceHallucinationRate,detailsUrl,isLabClaimedValue\nGemma 3 4B,0.9882501807664498,/models/gemma-3-4b/providers,false\nLFM2.5-1.2B-Thinking,0.9689452079243263,/models/lfm2-5-1-2b-thinking/providers,false\nJamba 1.7 Mini,0.9667042889390519,/models/jamba-1-7-mini/providers,false\nQwen3 1.7B,0.9643112062812277,/models/qwen3-1.7b-instruct/providers,false\nQwen3 VL 4B,0.9611451942740287,/models/qwen3-vl-4b-instruct/providers,false\nLing-mini-2.0,0.9580138736765242,/models/ling-mini-2-0/providers,false\nGemma 3n E4B,0.9555555555555556,/models/gemma-3n-e4b/providers,false\nGranite 4.0 Micro,0.9554863528118703,/models/granite-4-0-micro/providers,false\nQwen3 30B A3B 2507,0.9542599136238712,/models/qwen3-30b-a3b-2507/providers,false\nHermes 4 70B,0.9541603302194221,/models/hermes-4-llama-3-1-70b-reasoning/providers,false\nGLM-4.6,0.9500114652602614,/models/glm-4-6-reasoning/providers,false\nHermes 4 405B,0.9482100238663485,/models/hermes-4-llama-3-1-405b-reasoning/providers,false\nSolar Pro 2,0.9445249430759677,/models/solar-pro-2-reasoning/providers,false\nGranite 4.0 H 350M,0.9437716262975778,/models/granite-4-0-h-350m/providers,false\ngpt-oss-20B (high),0.9406077348066298,/models/gpt-oss-20b/providers,false\nLing-1T,0.9399043460178831,/models/ling-1t/providers,false\nLFM2.5-VL-1.6B,0.939521800281294,/models/lfm2-5-vl-1-6b/providers,false\nQwen3 0.6B,0.9370106142335132,/models/qwen3-0.6b-instruct/providers,false\nGranite 4.0 1B,0.935249246052865,/models/granite-4-0-nano-1b/providers,false\nDeepSeek V3.2,0.934945054945055,/models/deepseek-v3-2/providers,false\nGemini 2.5 Flash,0.933076225045372,/models/gemini-2-5-flash/providers,false\nQwen3 Next 80B A3B,0.9316066369890732,/models/qwen3-next-80b-a3b-instruct/providers,false\nGranite 3.3 8B,0.9309402332361516,/models/granite-3-3-8b-instruct/providers,false\nQwen3 VL 30B A3B,0.9293746301045571,/models/qwen3-vl-30b-a3b-instruct/providers,false\nQwen3 VL 4B,0.9280834914611006,/models/qwen3-vl-4b-reasoning/providers,false\nQwen3 0.6B,0.9239724819192098,/models/qwen3-0.6b-instruct-reasoning/providers,false\nGemma 3n E2B,0.9237636136404214,/models/gemma-3n-e2b/providers,false\nGLM-4.5-Air,0.9232741617357002,/models/glm-4-5-air/providers,false\nGLM-4.7,0.9226725082146768,/models/glm-4-7-non-reasoning/providers,false\nGemini 3 Flash,0.922463768115942,/models/gemini-3-flash-reasoning/providers,false\nNova 2.0 Omni (medium),0.9220621067586767,/models/nova-2-0-omni-reasoning-medium/providers,false\nQwen3.5 35B A3B,0.9209486166007905,/models/qwen3-5-35b-a3b-non-reasoning/providers,false\nQwen3 14B,0.9180672268907563,/models/qwen3-14b-instruct/providers,false\nQwen3 Max Thinking,0.9179548156956004,/models/qwen3-max-thinking/providers,false\nK2-V2 (high),0.9176014684886804,/models/k2-v2/providers,false\nMercury 2,0.9151121358205827,/models/mercury-2/providers,false\nSolar Pro 2,0.9149230161863403,/models/solar-pro-2/providers,false\nQwen3 Max Thinking (Preview),0.9130233634050428,/models/qwen3-max-thinking-preview/providers,false\nQwen3 VL 32B,0.9124902114330462,/models/qwen3-vl-32b-instruct/providers,false\ngpt-oss-120B (high),0.9118708855383308,/models/gpt-oss-120b/providers,false\nMiMo-V2-Flash,0.9105438401775805,/models/mimo-v2-flash-reasoning/providers,false\nGemini 3 Pro Preview (high),0.909297052154195,/models/gemini-3-pro/providers,false\nNVIDIA Nemotron 3 Nano,0.9091763820985332,/models/nvidia-nemotron-3-nano-30b-a3b/providers,false\nQwen3 Coder Next,0.9091269055632548,/models/qwen3-coder-next/providers,false\nApriel-v1.6-15B-Thinker,0.9088902267710215,/models/apriel-v1-6-15b-thinker/providers,false\nQwen3 VL 8B,0.9071949947862357,/models/qwen3-vl-8b-reasoning/providers,false\nMagistral Small 1.2,0.9069543440570218,/models/magistral-small-2509/providers,false\nNova 2.0 Lite (medium),0.9043442790128493,/models/nova-2-0-lite-reasoning-medium/providers,false\nQwen3 VL 8B,0.9041953663118347,/models/qwen3-vl-8b-instruct/providers,false\nMotif-2-12.7B,0.9041149832644221,/models/motif-2-12-7b/providers,false\nGLM-4.7,0.9029217719132894,/models/glm-4-7/providers,false\nDeepSeek V3.2 Exp,0.9027598102630444,/models/deepseek-v3-2-0925/providers,false\nGPT-5.1,0.9024906015037594,/models/gpt-5-1-non-reasoning/providers,false\nMinistral 3 14B,0.9021284682630178,/models/ministral-3-14b/providers,false\nGemini 3 Flash,0.9018048332823494,/models/gemini-3-flash/providers,false\nReka Flash 3,0.9011158137745287,/models/reka-flash-3/providers,false\nLing-flash-2.0,0.8996124031007752,/models/ling-flash-2-0/providers,false\nNova 2.0 Pro Preview (medium),0.8976714377269814,/models/nova-2-0-pro-reasoning-medium/providers,false\nRing-flash-2.0,0.8956590999601752,/models/ring-flash-2-0/providers,false\nGemma 3 27B,0.8954484860026661,/models/gemma-3-27b/providers,false\nDeepSeek R1 (Jan),0.8953572287707481,/models/deepseek-r1-0120/providers,false\nGemini 2.5 Flash (Sep),0.8947491795593061,/models/gemini-2-5-flash-preview-09-2025-reasoning/providers,false\nGLM-4.7-Flash,0.8945699564011098,/models/glm-4-7-flash/providers,false\nDeepSeek V3 (Dec),0.8939258597588209,/models/deepseek-v3/providers,false\nQwen3 Max,0.8939126599029554,/models/qwen3-max/providers,false\nMiniMax-M2.5,0.8929539295392954,/models/minimax-m2-5/providers,false\nQwen3 235B A22B 2507,0.8923076923076924,/models/qwen3-235b-a22b-instruct-2507-reasoning/providers,false\nQwen3 VL 30B A3B,0.8919623170976148,/models/qwen3-vl-30b-a3b-reasoning/providers,false\nK-EXAONE,0.8910397126322092,/models/k-exaone/providers,false\nQwen3.5 122B A10B,0.8906633906633906,/models/qwen3-5-122b-a10b-non-reasoning/providers,false\nQwen3.5 397B A17B,0.890507404709881,/models/qwen3-5-397b-a17b/providers,false\nMi:dm K 2.5 Pro,0.8890890890890891,/models/mi-dm-k-2-5-pro-dec28/providers,false\nNova 2.0 Omni,0.887911247130834,/models/nova-2-0-omni/providers,false\nQwen3 VL 235B A22B,0.8872651356993737,/models/qwen3-vl-235b-a22b-instruct/providers,false\nQwen3 Omni 30B A3B,0.886714928585404,/models/qwen3-omni-30b-a3b-reasoning/providers,false\nFalcon-H1R-7B,0.8865179437439379,/models/falcon-h1r-7b/providers,false\nMiniMax-M2,0.8841632827527249,/models/minimax-m2/providers,false\nGPT-5 mini (minimal),0.8836830311672439,/models/gpt-5-mini-minimal/providers,false\nGemini 2.5 Flash (Sep),0.8829811406498523,/models/gemini-2-5-flash-preview-09-2025/providers,false\nSolar Open 100B,0.8811800610376399,/models/solar-open-100b-reasoning/providers,false\nLFM2 8B A1B,0.8797853309481216,/models/lfm2-8b-a1b/providers,false\nGemini 2.5 Pro,0.8737704918032787,/models/gemini-2-5-pro/providers,false\nLlama 4 Maverick,0.8731556925787272,/models/llama-4-maverick/providers,false\nQwen3 30B A3B 2507,0.8726621567847195,/models/qwen3-30b-a3b-2507-reasoning/providers,false\nNova 2.0 Pro Preview (low),0.8720034246575342,/models/nova-2-0-pro-reasoning-low/providers,false\no3,0.8712121212121212,/models/o3/providers,false\nQwen3 235B,0.8700081499592502,/models/qwen3-235b-a22b-instruct/providers,false\nGLM-4.5V,0.869874137231019,/models/glm-4-5v/providers,false\nINTELLECT-3,0.8697895171275278,/models/intellect-3/providers,false\nGPT-5.3 Codex (xhigh),0.8689941237469755,/models/gpt-5-3-codex/providers,false\nQwen3 30B,0.8683267308055084,/models/qwen3-30b-a3b-instruct/providers,false\nMiniMax M1 80k,0.8680042238648363,/models/minimax-m1-80k/providers,false\nGemini 2.5 Flash-Lite (Sep),0.8676291175274502,/models/gemini-2-5-flash-lite-preview-09-2025-reasoning/providers,false\ngpt-oss-20B (low),0.865993396776073,/models/gpt-oss-20b-low/providers,false\nDevstral Small (May),0.8655179243414538,/models/devstral-small-2505/providers,false\nGemini 2.0 Flash,0.8654307524536532,/models/gemini-2-0-flash/providers,false\nGPT-5 (minimal),0.8621408711770158,/models/gpt-5-minimal/providers,false\nEXAONE 4.0 32B,0.8611111111111112,/models/exaone-4-0-32b-reasoning/providers,false\nJamba Reasoning 3B,0.859564596716631,/models/jamba-reasoning-3b/providers,false\nQwen3 Max (Preview),0.8567026194144838,/models/qwen3-max-preview/providers,false\nTri-21B-Think,0.8549791745550928,/models/tri-21b-think-v0-5/providers,false\nQwen3.5 122B A10B,0.854517271922055,/models/qwen3-5-122b-a10b/providers,false\nDeepSeek V3.1 Terminus,0.8534726758110167,/models/deepseek-v3-1-terminus/providers,false\nDevstral Small 2,0.8511811023622047,/models/devstral-small-2/providers,false\nLlama 3.3 70B,0.8511372867587328,/models/llama-3-3-instruct-70b/providers,false\nDevstral 2,0.8506013926988816,/models/devstral-2/providers,false\nGemini 3 Pro Preview (low),0.8485516372795969,/models/gemini-3-pro-low/providers,false\nLFM2.5-1.2B-Instruct,0.8484311292324056,/models/lfm2-5-1-2b-instruct/providers,false\nGrok 3,0.8460823064403628,/models/grok-3/providers,false\nERNIE 5.0 Thinking Preview,0.8458911419423693,/models/ernie-5-0-thinking-preview/providers,false\nRing-1T,0.8437240232751455,/models/ring-1t/providers,false\nGLM-4.7-Flash,0.8432115129710283,/models/glm-4-7-flash-non-reasoning/providers,false\nQwen3 4B 2507,0.8410608662468995,/models/qwen3-4b-2507-instruct-reasoning/providers,false\nDeepSeek R1 0528,0.8404537774559498,/models/deepseek-r1/providers,false\nQwen3.5 35B A3B,0.8401424680494448,/models/qwen3-5-35b-a3b/providers,false\nNova 2.0 Omni (low),0.8395339329517579,/models/nova-2-0-omni-reasoning-low/providers,false\nNova 2.0 Lite,0.8393063583815029,/models/nova-2-0-lite/providers,false\nGemma 3 1B,0.8385479688850476,/models/gemma-3-1b/providers,false\nMistral Large 3,0.8374698001317813,/models/mistral-large-3/providers,false\nK-EXAONE,0.8352628582273677,/models/k-exaone-non-reasoning/providers,false\nDeepSeek V3.1,0.8352124891587164,/models/deepseek-v3-1/providers,false\nSeed-OSS-36B-Instruct,0.8340425531914893,/models/seed-oss-36b-instruct/providers,false\nNVIDIA Nemotron 3 Nano,0.8294791876131108,/models/nvidia-nemotron-3-nano-30b-a3b-reasoning/providers,false\nQwen3 VL 235B A22B,0.8251998317206563,/models/qwen3-vl-235b-a22b-reasoning/providers,false\nQwen3.5 9B,0.8244140625,/models/qwen3-5-9b/providers,false\nQwen3 Next 80B A3B,0.8221056877773296,/models/qwen3-next-80b-a3b-reasoning/providers,false\nStep3 VL 10B,0.8216536184838648,/models/step3-vl-10b/providers,false\nGPT-5 (high),0.8211176635776467,/models/gpt-5/providers,false\nGPT-4.1 mini,0.8197979797979797,/models/gpt-4-1-mini/providers,false\nQwen3 32B,0.8195004029008863,/models/qwen3-32b-instruct-reasoning/providers,false\nGrok 4.1 Fast,0.8179446005620233,/models/grok-4-1-fast/providers,false\nDeepSeek V3.2,0.8168837675350702,/models/deepseek-v3-2-reasoning/providers,false\nLlama Nemotron Ultra,0.8166874739908447,/models/llama-3-1-nemotron-ultra-253b-v1-reasoning/providers,false\nGemini 3.1 Flash-Lite Preview,0.8162997903563941,/models/gemini-3-1-flash-lite-preview/providers,false\nLlama 3.2 11B (Vision),0.8160727542687454,/models/llama-3-2-instruct-11b-vision/providers,false\nK2-V2 (medium),0.8148446954417103,/models/k2-v2-medium/providers,false\nMistral Medium 3.1,0.8130452845866224,/models/mistral-medium-3-1/providers,false\nDeepSeek V3.2 Exp,0.8109175377468061,/models/deepseek-v3-2-reasoning-0925/providers,false\nEXAONE 4.0 32B,0.8103736754043502,/models/exaone-4-0-32b/providers,false\nPhi-4,0.8051449414474947,/models/phi-4/providers,false\nQwen3 VL 32B,0.8042042042042042,/models/qwen3-vl-32b-reasoning/providers,false\nHermes 4 405B,0.8037974683544303,/models/hermes-4-llama-3-1-405b/providers,false\nDeepSeek V3.1,0.8033247483025052,/models/deepseek-v3-1-reasoning/providers,false\nQwen3 30B,0.8028616852146264,/models/qwen3-30b-a3b-instruct-reasoning/providers,false\no4-mini (high),0.8011941618752765,/models/o4-mini/providers,false\nHermes 4 70B,0.8011815033611733,/models/hermes-4-llama-3-1-70b/providers,false\nGPT-5 (medium),0.8007088331515813,/models/gpt-5-medium/providers,false\nQwen3.5 397B A17B,0.7984581497797357,/models/qwen3-5-397b-a17b-non-reasoning/providers,false\nQwen3.5 4B,0.7974393273456908,/models/qwen3-5-4b/providers,false\nGPT-5.2 (xhigh),0.797153024911032,/models/gpt-5-2/providers,false\nQwen3.5 27B,0.7969211303247575,/models/qwen3-5-27b/providers,false\nGPT-4.1,0.7962189492196087,/models/gpt-4-1/providers,false\nNova 2.0 Lite (low),0.7955625990491284,/models/nova-2-0-lite-reasoning-low/providers,false\nDoubao Seed Code,0.7940391459074733,/models/doubao-seed-code/providers,false\nLFM2 1.2B,0.7929653491206686,/models/lfm2-1-2b/providers,false\nQwen3 Coder 30B A3B,0.7922772277227723,/models/qwen3-coder-30b-a3b-instruct/providers,false\nHyperCLOVA X SEED Think (32B),0.7907067551737602,/models/hyperclova-x-seed-think-32b/providers,false\no3-mini (high),0.78978471929084,/models/o3-mini-high/providers,false\nGrok Code Fast 1,0.7851673594399475,/models/grok-code-fast-1/providers,false\nLlama 4 Scout,0.7831090306221962,/models/llama-4-scout/providers,false\nClaude 3 Haiku,0.7824949698189135,/models/claude-3-haiku/providers,false\nGrok 4 Fast,0.7823587710604559,/models/grok-4-fast/providers,false\ngpt-oss-120B (low),0.781491002570694,/models/gpt-oss-120b-low/providers,false\nQwen3 4B 2507,0.7799665489685932,/models/qwen3-4b-2507-instruct/providers,false\nNova Pro,0.7788268380875853,/models/nova-pro/providers,false\nQwen3 235B,0.7718545751633987,/models/qwen3-235b-a22b-instruct-reasoning/providers,false\nDevstral Small,0.7712890625,/models/devstral-small/providers,false\nMistral Small 3.1,0.7684313725490196,/models/mistral-small-3-1/providers,false\nDeepSeek V3 0324,0.7674518201284797,/models/deepseek-v3-0324/providers,false\nNova 2.0 Pro Preview,0.765441906653426,/models/nova-2-0-pro/providers,false\nMistral Small 3.2,0.7626953125,/models/mistral-small-3-2/providers,false\nCommand A,0.7615186869685584,/models/command-a/providers,false\nClaude Opus 4.6,0.7599513825584928,/models/claude-opus-4-6/providers,false\nGPT-5 (low),0.7583910495471498,/models/gpt-5-low/providers,false\nLlama Nemotron Super 49B v1.5,0.7554173354735152,/models/llama-nemotron-super-49b-v1-5-reasoning/providers,false\nQwen3 14B,0.7546541250244954,/models/qwen3-14b-instruct-reasoning/providers,false\nGemini 2.5 Flash-Lite,0.7541944612896705,/models/gemini-2-5-flash-lite-reasoning/providers,false\nClaude Opus 4.5,0.7539370078740157,/models/claude-opus-4-5/providers,false\nMiMo-V2-Flash,0.7513269117357971,/models/mimo-v2-flash/providers,false\nLlama 3.1 70B,0.7471573289228861,/models/llama-3-1-instruct-70b/providers,false\nGPT-5.1 Codex (high),0.7436951754385965,/models/gpt-5-1-codex/providers,false\nKimi K2,0.7424276930084264,/models/kimi-k2/providers,false\nQwen3.5 27B,0.741495253164557,/models/qwen3-5-27b-non-reasoning/providers,false\nGPT-5 Codex (high),0.7407608695652174,/models/gpt-5-codex/providers,false\nGPT-5.2 Codex (xhigh),0.7282181000562113,/models/gpt-5-2-codex/providers,false\nDeepSeek V3.1 Terminus,0.7250231267345051,/models/deepseek-v3-1-terminus-reasoning/providers,false\nGrok 4.1 Fast,0.7241071428571428,/models/grok-4-1-fast-reasoning/providers,false\nKimi K2 Thinking,0.7216003810431055,/models/kimi-k2-thinking/providers,false\nQwen3 235B 2507,0.7075906268776286,/models/qwen3-235b-a22b-instruct-2507/providers,false\nGemini 2.5 Flash,0.7065628476084539,/models/gemini-2-5-flash-reasoning/providers,false\nOlmo 3.1 32B Instruct,0.7039473684210527,/models/olmo-3-1-32b-instruct/providers,false\nTri-21B-think Preview,0.7014243973703433,/models/tri-21b-think-preview/providers,false\nLFM2 24B A2B,0.6997328584149599,/models/lfm2-24b-a2b/providers,false\nKimi K2 0905,0.6960214573088959,/models/kimi-k2-0905/providers,false\no1,0.6925236029599388,/models/o1/providers,false\nLlama 3.1 Nemotron 70B,0.6882106523040096,/models/llama-3-1-nemotron-instruct-70b/providers,false\nNova Premier,0.6847310941685555,/models/nova-premier/providers,false\nMistral Large 2 (Nov),0.6784223706176962,/models/mistral-large-2/providers,false\nGLM-4.5,0.6715522984676882,/models/glm-4.5/providers,false\nGLM-4.6V,0.6711111111111111,/models/glm-4-6v/providers,false\nERNIE 4.5 300B A47B,0.6710391473662636,/models/ernie-4-5-300b-a47b/providers,false\nMiniMax-M2.1,0.6701592623637888,/models/minimax-m2-1/providers,false\nOlmo 3.1 32B Think,0.668016975308642,/models/olmo-3-1-32b-think/providers,false\nLlama 3.2 1B,0.6678008595988538,/models/llama-3-2-instruct-1b/providers,false\nKAT-Coder-Pro V1,0.6655221167440921,/models/kat-coder-pro-v1/providers,false\nGemini 2.5 Flash-Lite (Sep),0.6643982111608011,/models/gemini-2-5-flash-lite-preview-09-2025/providers,false\nGLM-4.6,0.6612665684830633,/models/glm-4-6/providers,false\nGrok 4 Fast,0.6596294700560104,/models/grok-4-fast-reasoning/providers,false\nClaude Sonnet 4.6,0.6593229446534121,/models/claude-sonnet-4-6/providers,false\nLlama Nemotron Super 49B v1.5,0.6586940015186029,/models/llama-nemotron-super-49b-v1-5/providers,false\nKimi K2.5,0.6460289266683583,/models/kimi-k2-5/providers,false\nGrok 4,0.6420244526585158,/models/grok-4/providers,false\nDevstral Medium,0.6259007617871114,/models/devstral-medium/providers,false\nClaude Opus 4.6 (max),0.6133043207957725,/models/claude-opus-4-6-adaptive/providers,false\nNVIDIA Nemotron Nano 9B V2,0.6100676183320811,/models/nvidia-nemotron-nano-9b-v2-reasoning/providers,false\nMistral Medium 3,0.6091391268869849,/models/mistral-medium-3/providers,false\nGPT-5.2,0.6080603346688663,/models/gpt-5-2-non-reasoning/providers,false\nGPT-5.2 (medium),0.6057280513918629,/models/gpt-5-2-medium/providers,false\nLFM2 2.6B,0.604994723883222,/models/lfm2-2-6b/providers,false\nClaude Opus 4.5,0.5977893767270495,/models/claude-opus-4-5-thinking/providers,false\n\"Claude Sonnet 4.6 (Non-reasoning, Low Effort)\",0.5955114822546973,/models/claude-sonnet-4-6-non-reasoning-low-effort/providers,false\nMagistral Medium 1.2,0.594862076226574,/models/magistral-medium-2509/providers,false\nGPT-4o (Aug),0.5945414847161572,/models/gpt-4o-2024-08-06/providers,false\nMagistral Medium 1,0.5944923270969098,/models/magistral-medium/providers,false\nK2 Think V2,0.5888866917144552,/models/k2-think-v2/providers,false\nGPT-5 nano (high),0.5633285743422395,/models/gpt-5-nano/providers,false\nNova Lite,0.555576070901034,/models/nova-lite/providers,false\nGPT-5 mini (high),0.5408722331799255,/models/gpt-5-mini/providers,false\nClaude 3.7 Sonnet,0.5234774523477452,/models/claude-3-7-sonnet/providers,false\nGPT-5.1 (high),0.5132176234979974,/models/gpt-5-1/providers,false\nGPT-5.1 Codex mini (high),0.5076558119473797,/models/gpt-5-1-codex-mini/providers,false\nClaude 4.5 Sonnet,0.5065532306277305,/models/claude-4-5-sonnet/providers,false\nGLM-4.6V,0.49891368753703336,/models/glm-4-6v-reasoning/providers,false\nGemini 3.1 Pro Preview,0.4986964618249534,/models/gemini-3-1-pro-preview/providers,false\nGPT-5 nano (medium),0.4884783472387763,/models/gpt-5-nano-medium/providers,false\nMiMo-V2-Flash (Feb 2026),0.4839248434237996,/models/mimo-v2-0206/providers,false\nKimi K2.5,0.4785992217898833,/models/kimi-k2-5-non-reasoning/providers,false\nClaude 4.5 Sonnet,0.47399556322405717,/models/claude-4-5-sonnet-thinking/providers,false\nClaude Sonnet 4.6 (max),0.4613674263479711,/models/claude-sonnet-4-6-adaptive/providers,false\nGLM-5,0.43867212761370983,/models/glm-5-non-reasoning/providers,false\nQwen3 Coder 480B,0.4328708227477037,/models/qwen3-coder-480b-a35b-instruct/providers,false\nGPT-5 mini (medium),0.4239502999143102,/models/gpt-5-mini-medium/providers,false\nLlama 3.1 8B,0.4176939811457578,/models/llama-3-1-instruct-8b/providers,false\nClaude 3.5 Haiku,0.4153581003446955,/models/claude-3-5-haiku/providers,false\nClaude 4 Sonnet,0.4082509669101848,/models/claude-4-sonnet/providers,false\nClaude 3.7 Sonnet,0.38513200555812876,/models/claude-3-7-sonnet-thinking/providers,false\nGPT-4o (Nov),0.37891678771529363,/models/gpt-4o/providers,false\nQwen3.5 0.8B,0.3666385846672283,/models/qwen3-5-0-8b/providers,false\nGLM-5,0.34001823154056515,/models/glm-5/providers,false\nGemma 3 270M,0.32464694014794887,/models/gemma-3-270m/providers,false\nClaude 4 Sonnet,0.2851931330472103,/models/claude-4-sonnet-thinking/providers,false\nClaude 4.5 Haiku,0.26221235365361323,/models/claude-4-5-haiku-reasoning/providers,false\nGrok 3 mini Reasoning (high),0.25354609929078015,/models/grok-3-mini-reasoning/providers,false\nClaude 4.5 Haiku,0.2471042471042471,/models/claude-4-5-haiku/providers,false"}
Detailed Domain Results
AA-Omniscience Index Across Domains (Normalized)
AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.
{"@context":"https://schema.org","@type":"Dataset","name":"AA-Omniscience Index Across Domains (Normalized)","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"AA-Omniscience Index; Scores are normalized per domain across all models tested, where green represents the highest score for that domain and red represents the lowest score for that domain.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"label,omniscience,detailsUrl\nGemini 3.1 Pro Preview,[object Object],/models/gemini-3-1-pro-preview/providers\n\"Claude Opus 4.6 (Adaptive Reasoning, Max Effort)\",[object Object],/models/claude-opus-4-6-adaptive/providers\n\"Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)\",[object Object],/models/claude-sonnet-4-6-adaptive/providers\nGemini 3 Flash Preview (Reasoning),[object Object],/models/gemini-3-flash-reasoning/providers\nGPT-5.3 Codex (xhigh),[object Object],/models/gpt-5-3-codex/providers\nGrok 4,[object Object],/models/grok-4/providers\n\"Claude Opus 4.6 (Non-reasoning, High Effort)\",[object Object],/models/claude-opus-4-6/providers\nGLM-5 (Reasoning),[object Object],/models/glm-5/providers\nClaude 4.1 Opus (Reasoning),[object Object],/models/claude-4-1-opus-thinking/providers\nGPT-5.2 (xhigh),[object Object],/models/gpt-5-2/providers\nClaude 4.5 Haiku (Reasoning),[object Object],/models/claude-4-5-haiku-reasoning/providers\nKimi K2.5 (Reasoning),[object Object],/models/kimi-k2-5/providers\nGemini 3.1 Flash-Lite Preview,[object Object],/models/gemini-3-1-flash-lite-preview/providers\nMiMo-V2-Flash (Feb 2026),[object Object],/models/mimo-v2-0206/providers\nDeepSeek V3.2 (Reasoning),[object Object],/models/deepseek-v3-2-reasoning/providers\nGrok 4.1 Fast (Reasoning),[object Object],/models/grok-4-1-fast-reasoning/providers\nQwen3.5 397B A17B (Reasoning),[object Object],/models/qwen3-5-397b-a17b/providers\nK2 Think V2,[object Object],/models/k2-think-v2/providers\nKAT-Coder-Pro V1,[object Object],/models/kat-coder-pro-v1/providers\nMistral Large 3,[object Object],/models/mistral-large-3/providers\nMiniMax-M2.5,[object Object],/models/minimax-m2-5/providers\nLlama 4 Maverick,[object Object],/models/llama-4-maverick/providers\nNova 2.0 Pro Preview (medium),[object Object],/models/nova-2-0-pro-reasoning-medium/providers\ngpt-oss-120B (high),[object Object],/models/gpt-oss-120b/providers\nNVIDIA Nemotron 3 Nano 30B A3B (Reasoning),[object Object],/models/nvidia-nemotron-3-nano-30b-a3b-reasoning/providers\nMi:dm K 2.5 Pro,[object Object],/models/mi-dm-k-2-5-pro-dec28/providers\nK-EXAONE (Reasoning),[object Object],/models/k-exaone/providers\ngpt-oss-20B (high),[object Object],/models/gpt-oss-20b/providers"}
Software Engineering Deep Dive
Software Engineering AA-Omniscience Index Across Languages (Normalized)
AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.
{"@context":"https://schema.org","@type":"Dataset","name":"Software Engineering AA-Omniscience Index Across Languages (Normalized)","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Software Engineering AA-Omniscience Index; Scores are normalized per language across all models tested, where green represents the highest score for that language and red represents the lowest score for that language.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":"label,omniscience,detailsUrl\nGemini 3.1 Pro Preview,[object Object],/models/gemini-3-1-pro-preview/providers\n\"Claude Opus 4.6 (Adaptive Reasoning, Max Effort)\",[object Object],/models/claude-opus-4-6-adaptive/providers\n\"Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)\",[object Object],/models/claude-sonnet-4-6-adaptive/providers\nGemini 3 Flash Preview (Reasoning),[object Object],/models/gemini-3-flash-reasoning/providers\nGPT-5.3 Codex (xhigh),[object Object],/models/gpt-5-3-codex/providers\nGrok 4,[object Object],/models/grok-4/providers\n\"Claude Opus 4.6 (Non-reasoning, High Effort)\",[object Object],/models/claude-opus-4-6/providers\nGLM-5 (Reasoning),[object Object],/models/glm-5/providers\nClaude 4.1 Opus (Reasoning),[object Object],/models/claude-4-1-opus-thinking/providers\nGPT-5.2 (xhigh),[object Object],/models/gpt-5-2/providers\nClaude 4.5 Haiku (Reasoning),[object Object],/models/claude-4-5-haiku-reasoning/providers\nKimi K2.5 (Reasoning),[object Object],/models/kimi-k2-5/providers\nGemini 3.1 Flash-Lite Preview,[object Object],/models/gemini-3-1-flash-lite-preview/providers\nMiMo-V2-Flash (Feb 2026),[object Object],/models/mimo-v2-0206/providers\nDeepSeek V3.2 (Reasoning),[object Object],/models/deepseek-v3-2-reasoning/providers\nGrok 4.1 Fast (Reasoning),[object Object],/models/grok-4-1-fast-reasoning/providers\nQwen3.5 397B A17B (Reasoning),[object Object],/models/qwen3-5-397b-a17b/providers\nK2 Think V2,[object Object],/models/k2-think-v2/providers\nKAT-Coder-Pro V1,[object Object],/models/kat-coder-pro-v1/providers\nMistral Large 3,[object Object],/models/mistral-large-3/providers\nMiniMax-M2.5,[object Object],/models/minimax-m2-5/providers\nLlama 4 Maverick,[object Object],/models/llama-4-maverick/providers\nNova 2.0 Pro Preview (medium),[object Object],/models/nova-2-0-pro-reasoning-medium/providers\ngpt-oss-120B (high),[object Object],/models/gpt-oss-120b/providers\nNVIDIA Nemotron 3 Nano 30B A3B (Reasoning),[object Object],/models/nvidia-nemotron-3-nano-30b-a3b-reasoning/providers\nMi:dm K 2.5 Pro,[object Object],/models/mi-dm-k-2-5-pro-dec28/providers\nK-EXAONE (Reasoning),[object Object],/models/k-exaone/providers\ngpt-oss-20B (high),[object Object],/models/gpt-oss-20b/providers"}
AA-Omniscience Index Question Breakdown
Business
Humanities & Social Sciences
Science, Engineering & Mathematics
Health
Law
Software Engineering (SWE)
AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.
Model Size (Open Weights Models Only)
AA-Omniscience Index vs. Total Parameters
AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer. Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct.
The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.
AA-Omniscience Accuracy vs. Total Parameters
AA-Omniscience Accuracy (higher is better) measures the proportion of correctly answered questions out of all questions, regardless of whether the model chooses to answer
The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.
AA-Omniscience Hallucination Rate vs. Total Parameters
AA-Omniscience Hallucination Rate (lower is better) measures how often the model answers incorrectly when it should have refused or admitted to not knowing the answer. It is defined as the proportion of incorrect answers out of all non-correct responses, i.e. incorrect / (incorrect + partial answers + not attempted).
The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.
AA-Omniscience Index: Token Usage
The total number of tokens used to run the evaluation, including input tokens (prompt), reasoning tokens (for reasoning models), and answer tokens (final response).
AA-Omniscience Index: Cost Breakdown
The cost to run the evaluation, calculated using the model's input and output token pricing and the number of tokens used.
AA-Omniscience Index: Score vs. Release Date
Example Problems
Explore Evaluations
A composite benchmark aggregating ten challenging evaluations to provide a holistic measure of AI capabilities across mathematics, science, coding, and reasoning.
GDPval-AA is Artificial Analysis' evaluation framework for OpenAI's GDPval dataset. It tests AI models on real-world tasks across 44 occupations and 9 major industries. Models are given shell access and web browsing capabilities in an agentic loop via Stirrup to solve tasks, with ELO ratings derived from blind pairwise comparisons.
A benchmark measuring factual recall and hallucination across various economically relevant domains.
A composite measure providing an industry standard to communicate model openness for users and developers.
An enhanced version of MMLU with 12,000 graduate-level questions across 14 subject areas, featuring ten answer options and deeper reasoning requirements.
A lightweight, multilingual version of MMLU, designed to evaluate knowledge and reasoning skills across a diverse range of languages and cultural contexts.
The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but skilled non-experts only reach 34% despite web access.
A frontier-level benchmark with 2,500 expert-vetted questions across mathematics, sciences, and humanities, designed to be the final closed-ended academic evaluation.
A contamination-free coding benchmark that continuously harvests fresh competitive programming problems from LeetCode, AtCoder, and CodeForces, evaluating code generation, self-repair, and execution.
A scientist-curated coding benchmark featuring 338 sub-tasks derived from 80 genuine laboratory problems across 16 scientific disciplines.
A 500-problem subset from the MATH dataset, featuring competition-level mathematics across six domains including algebra, geometry, and number theory.
A benchmark evaluating precise instruction-following generalization on 58 diverse, verifiable out-of-domain constraints that test models' ability to follow specific output requirements.
All 30 problems from the 2025 American Invitational Mathematics Examination, testing olympiad-level mathematical reasoning with integer answers from 000-999.
A benchmark designed to test LLMs on research-level physics reasoning tasks, featuring 71 composite research challenges.
An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.
A dual-control conversational AI benchmark simulating technical support scenarios where both agent and user must coordinate actions to resolve telecom service issues.
A challenging benchmark measuring language models' ability to extract, reason about, and synthesize information from long-form documents ranging from 10k to 100k tokens (measured using the cl100k_base tokenizer).
An enhanced MMMU benchmark that eliminates shortcuts and guessing strategies to more rigorously test multimodal models across 30 academic disciplines.