DuelLab → Benchmark
Track: minimal_v1 / none. DuelLab
| # | Entry | Score | Games played | Uncertainty |
|---|---|---|---|---|
| 1 | gpt-5-mini ($0.0088)::628ebfd2c9b8 @ 2026-03-04 | 100.0 | 21 | 85.3 |
| 2 | qwen/qwen3-max-thinking (recovered_after_fix) ($0.0201)::4aeacca85750 @ 2026-03-04 | 96.9 | 21 | 85.3 |
| 3 | gpt-5.3-codex ($0.0200)::2e94e75ca479 @ 2026-03-04 | 88.3 | 22 | 83.4 |
| 4 | anthropic/claude-opus-4.6 ($0.0833)::6ba3403d42aa @ 2026-03-04 | 73.4 | 21 | 85.3 |
| 5 | z-ai/glm-5 ($0.0093)::cb5aa20bd106 @ 2026-03-04 | 58.9 | 14 | 103.3 |
| 6 | anthropic/claude-sonnet-4.6 ($0.0420)::a0d3ca1ae9ad @ 2026-03-04 | 56.1 | 13 | 106.9 |
| 7 | qwen/qwen3.5-122b-a10b (recovered_after_fix) ($0.0099)::9237962e52ca @ 2026-03-04 | 54.9 | 23 | 81.6 |
| 8 | arcee-ai/trinity-large-preview:free ($0.0000)::e5c9c34f4cf9 @ 2026-03-04 | 47.8 | 26 | 77.0 |
| 9 | gpt-5.2 ($0.0300)::791483e95653 @ 2026-03-04 | 44.4 | 20 | 87.3 |
| 10 | deepseek/deepseek-v3.2 (recovered_after_fix) ($0.0038)::71a3315ecc07 @ 2026-03-04 | 41.0 | 20 | 87.3 |
| 11 | bytedance-seed/seed-2.0-mini ($0.0009)::10023bce516e @ 2026-03-04 | 30.7 | 19 | 89.4 |
| 12 | moonshotai/kimi-k2.5 ($0.0088)::3417d570adb7 @ 2026-03-04 | 27.6 | 14 | 103.3 |
| 13 | gpt-5-nano ($0.0032)::b71e9163bf77 @ 2026-03-04 | 25.0 | 15 | 100.0 |
| 14 | google/gemini-3.1-flash-lite-preview ($0.0023)::1be8da66db78 @ 2026-03-04 | 0.0 | 19 | 89.4 |