ai/qwen3.5

Verified Publisher

By Docker

Updated 6 days ago

397B MoE model with 17B activation for reasoning, coding, agents, and multimodal understanding

Model
5

50K+

ai/qwen3.5 repository overview

Qwen3.5

Qwen3.5 represents a significant advancement in foundation models, delivering exceptional utility and performance through breakthroughs in multimodal learning, architectural efficiency, and global accessibility. The flagship Qwen3.5-397B-A17B model features 397 billion total parameters with 17 billion activated parameters using a sparse Mixture-of-Experts architecture, achieving state-of-the-art results across reasoning, coding, agents, and visual understanding tasks.

This model integrates unified vision-language capabilities through early fusion training on multimodal tokens, achieving cross-generational parity with text-focused Qwen3 models while surpassing previous Qwen3-VL models. The efficient hybrid architecture combines Gated Delta Networks with sparse Mixture-of-Experts to deliver high-throughput inference with minimal latency and cost overhead.

Qwen3.5 provides expanded multilingual support for 201 languages and dialects, enabling inclusive worldwide deployment with nuanced cultural and regional understanding. The model's reinforcement learning approach scales across million-agent environments with progressively complex task distributions for robust real-world adaptability.


Characteristics

AttributeValue
ProviderQwen / Alibaba Cloud
ArchitectureMixture-of-Experts (512 experts, 10 routed + 1 shared active)
Total Parameters397B (17B activated)
Context Length262,144 tokens (extensible to 1,010,000)
Languages201 languages and dialects
Input modalitiesText, Image
Output modalitiesText
LicenseApache 2.0

Using this model with Docker Model Runner

docker model run qwen3.5

For more information, check out the Docker Model Runner docs.

Benchmarks

Benchmark Overview

Knowledge
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
MMLU-Pro87.489.589.885.787.187.8
MMLU-Redux95.095.695.992.894.594.9
SuperGPQA67.970.674.067.369.270.4
C-Eval90.592.293.493.794.093.0
Instruction Following
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
IFEval94.890.993.593.493.992.6
IFBench75.458.070.470.970.276.5
MultiChallenge57.954.264.263.362.767.6
Long Context
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
AA-LCR72.774.070.768.770.068.7
LongBench v254.564.468.260.661.063.2
STEM
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
GPQA92.487.091.987.487.688.4
HLE35.530.837.530.230.128.7
Reasoning
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
LiveCodeBench v687.784.890.785.985.083.6
HMMT Feb 2599.492.997.398.095.494.8
HMMT Nov 25100.093.393.394.791.192.7
IMOAnswerBench86.384.083.383.981.880.9
AIME2696.793.390.693.393.391.3
General Agent
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
BFCL-V463.177.572.567.768.372.9
TAU2-Bench87.191.685.484.677.086.7
VITA-Bench38.256.351.640.941.949.7
DeepPlanning44.633.923.328.714.534.3
Tool Decathlon43.843.536.418.827.838.3
MCP-Mark57.542.353.933.529.546.1
Search Agent
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
HLE w/ tool45.543.445.849.850.248.3
BrowseComp65.867.859.253.9--69.0
WideSearch76.876.468.057.972.774.0
Seal-045.047.745.546.957.446.9
Multilingualism
BenchmarkGPT5.2Claude 4.5 OpusGemini-3 ProQwen3-Max-ThinkingK2.5-1T-A32BQwen3.5-397B-A17B
MMMLU89.590.190.684.486.088.5

Considerations

  • The model requires substantial computational resources; quantized versions (Q2_K_XL through Q8_0) are available for different hardware configurations
  • Native context length is 262K tokens; extended context up to 1M tokens may require additional configuration
  • This repository contains GGUF quantized versions optimized by Unsloth for efficient local inference
  • The model features a sparse MoE architecture activating only 17B of 397B parameters per token, balancing performance with efficiency
  • Multimodal capabilities support both text and image inputs through unified vision-language training
  • Vision capabilities require separate mmproj model files (included in repository)
Generated by

This model card was automatically generated using cagent-action. Want to learn more about Docker Model Runner? Check out the project repository: https://github.com/docker/model-runner.

Tag summary

Content type

Model

Digest

sha256:c6c5be279

Size

5.6 GB

Last updated

6 days ago

docker model pull ai/qwen3.5:9B-UD-Q4_K_XL

This week's pulls

Pulls:

5,584

Last week