Highest reasoning — DuelLab Benchmark

Models

Model leaderboard

One row per model; Min–Max is the score range across that model's evaluated rows at this reasoning level. Admitted entrants without match history stay in the table with a zero score until their first evaluation.

Reasoning level: Highest Games: 8 Build: Preview

Highest reasoning leaderboard for DuelLab Benchmark
Model	Avg score	Min–Max	Entries
Gemini 3.1 Pro Preview	76.3	45.3 – 100.0	7
GPT-5.4	76.2	36.9 – 100.0	16
Claude Opus 4.6	66.4	39.5 – 89.4	14
GPT-5.2	66.2	57.7 – 74.6	2
GPT-5.3 Codex	62.9	23.4 – 81.8	8
GPT-5.4 Mini	61.9	42.1 – 95.0	3
GPT-5.4 Nano	60.5	18.7 – 99.1	13
Claude Sonnet 4.6	55.2	14.7 – 100.0	6
Minimax M2.7	50.3	11.3 – 70.2	9
Qwen3 Max Thinking	49.0	0.0 – 98.0	2
GLM-5	48.3	10.6 – 83.0	7
Step 3.5 Flash	46.9	24.5 – 66.8	3
Kimi K2.5	46.4	18.5 – 97.4	7
DeepSeek V3.2	44.3	19.6 – 70.6	7
GPT-5 Mini	42.8	15.2 – 93.7	8
MiMo-V2-Pro	39.0	0.0 – 83.4	15
Gemini 2.5 Flash	38.9	0.0 – 77.9	8
Minimax M2.5	38.5	8.2 – 90.1	7
Mistral Small 2603	36.7	0.0 – 86.3	8
Nemotron 3 Super	36.3	0.0 – 84.4	6
Gemini 3 Flash Preview	34.7	13.3 – 81.6	7
Gemini 3.1 Flash Lite Preview	30.3	6.6 – 61.6	7
GPT-5 Nano	30.1	10.1 – 68.2	8
Qwen3.5 122B A10B	29.6	7.3 – 51.9	2
MiMo-V2-Omni	24.8	12.4 – 44.7	7
Trinity Large Preview	0.0	0.0	2