Qwen3.5-397B-A17B: Everything You Need to Know

Alibaba's new Qwen3.5-397B-A17B is the #3 open weights model in the Artificial Analysis Intelligence Index - a significant upgrade from Qwen3-235B-A22B-2507

Qwen3.5-397B-A17B is the first model released by Alibaba under the new Qwen3.5 family. It scores 45 on the Artificial Analysis Intelligence Index, ranking #3 among open weights models behind GLM-5 (Reasoning, 50) and Kimi K2.5 (Reasoning, 47). It has 397B/17B total/active parameters, significantly lower than peer models such as Kimi K2.5 (1T/32B), GLM-5 (744B/40B) and DeepSeek V3.2 (671B/37B).

Qwen3.5 397B is also the first Qwen open weights model with native vision input, supporting image and video natively. Previously, Alibaba maintained separate model lines for vision (Qwen3-VL) and text-only (Qwen3). Qwen3.5 397B unifies these into a single model, following the broader industry trend toward natively multimodal foundation models.

Additionally, Qwen3.5 397B supports both reasoning and non-reasoning modes within a single model - a reversal compared to the Qwen3 family of models where Alibaba released separate instruct and thinking variants.

Key takeaways from our independent benchmarking:

➤ 🧠 Intelligence gains driven by improved agentic performance: Qwen3.5 397B scores 45 on our Intelligence Index, a +16 point gain over the previous open weights Qwen3 235B (Reasoning, 29). Qwen3.5 397B achieves a GDPval-AA ELO of 1,221, a significant increase of 361 points compared to Qwen3 235B (860). GDPval-AA is a frontier agentic eval that compares model outputs on realistic knowledge work tasks like preparing presentations, analysis, and more. Qwen3.5 397B also improves over Qwen3 235B across agentic coding (+27 p.p. on TerminalBench Hard), scientific reasoning (+12 p.p. on HLE) and instruction following (+28 p.p. on IFBench).

➤ 📉 Hallucination remains higher than peers: Qwen3.5 397B's AA-Omniscience Index is -32, a 16-point improvement over Qwen3 235B (-48), driven primarily by higher accuracy (30% vs 22%) rather than a reduction in hallucination rate (88% vs 90%). The model still has a high hallucination rate relative to leading open weights models. We measure hallucination rate as how often the model answers a question when it should have refused or admitted to not knowing the answer. Kimi K2.5 and GLM-5 achieve an AA-Omniscience Index of -11 and -1 respectively.

➤ 🪙 Slightly more token efficient compared to peers: Qwen3.5 397B used ~86M output tokens (including ~80M reasoning tokens) to run the Intelligence Index, more than Qwen3 235B (63M), but less than Kimi K2.5 (89M) and GLM-5 (110M)

Key Model Details:

📏 Context window: 262K tokens
⚙️ Size: 397B total / 17B active parameters (MoE). Fewer active parameters than Qwen3 235B (22B active), GLM-4.7 (32B active) and Kimi K2.5 (32B active).
©️ License: Apache 2.0.
🌐 Availability: Qwen3.5 397B is available in Qwen Chat and via Alibaba's first-party API. Alibaba also offers Qwen3.5-Plus, a hosted variant with a 1M context window and built-in tool use. No third-party API providers at the time of publishing. Weights are available on HuggingFace

Intelligence Index

At 17B active parameters, Qwen3.5 397B is on the frontier of the Intelligence vs. Active Parameters chart

Intelligence vs Active Parameters

Based on Alibaba Cloud's per-token pricing ($0.60/$3.60 per 1M input/output) and the number of tokens used to run the Intelligence Index, Qwen3.5 397B is close to the Pareto frontier of Intelligence vs. Cost to Run the Intelligence Index chart

Intelligence vs Cost

Qwen3.5 397B performs significantly better than Qwen3 235B at GDPval-AA, an eval that measures the performance of the model at real world agentic tasks

GDPval-AA

Qwen3.5 397B improves at AA-Omniscience Index compared to Qwen3 235B with higher accuracy, but limited improvement in hallucination rate

Omniscience

Check out additional analysis for this model on X: https://x.com/ArtificialAnlys/status/2023794497055060262?s=20

Explore the full suite of benchmarks at https://artificialanalysis.ai/