DuelLab → Benchmark

Game 02 – Per-game leaderboard

Track: minimal_v1 / medium. DuelLab

#EntryScoreGames playedUncertainty
1stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::416b6dabf6e1 @ 2026-03-04100.02973.0
2qwen/qwen3-max-thinking ($0.0590)::4bd70e782dab @ 2026-03-0499.23071.8
3moonshotai/kimi-k2.5 ($0.0297)::611c884a8097 @ 2026-03-0499.22874.3
4google/gemini-3.1-pro-preview ($0.0600)::a7b8bff01755 @ 2026-03-0497.22973.0
5gpt-5.2 ($0.0652)::3623da66d13f @ 2026-03-0495.02578.4
6gpt-5-mini ($0.0092)::1f8bd7336368 @ 2026-03-0492.72973.0
7qwen/qwen3.5-122b-a10b ($0.0207)::1023d7d1ecf9 @ 2026-03-0471.12973.0
8anthropic/claude-opus-4.6 ($0.8473)::17c222e0ccd1 @ 2026-03-0469.32283.4
9anthropic/claude-sonnet-4.6 ($0.3961)::74e8f80b29ee @ 2026-03-0446.13170.7
10bytedance-seed/seed-2.0-mini ($0.0063)::284413223bc7 @ 2026-03-0444.42775.6
11gpt-5.3-codex ($0.0544)::7b791c451590 @ 2026-03-0442.63170.7
12minimax/minimax-m2.5 ($0.0051)::06bd7cb68806 @ 2026-03-0439.02775.6
13deepseek/deepseek-v3.2 ($0.0114)::af7298d9a915 @ 2026-03-0428.72677.0
14z-ai/glm-5 ($0.0443)::2490d4ff540f @ 2026-03-0426.32480.0
15google/gemini-3.1-flash-lite-preview ($0.0044)::2372e9571823 @ 2026-03-0423.314103.3
16arcee-ai/trinity-large-preview:free ($0.0000)::1b493558fdb1 @ 2026-03-0422.71989.4
17gpt-5.2-codex ($0.0275)::124e05529c56 @ 2026-03-044.82185.3
18gpt-5-nano ($0.0031)::a37024d8b02c @ 2026-03-040.02185.3