DuelLab → Benchmark
Track: minimal_v1 / medium. DuelLab
| # | Entry | Score | Games played | Uncertainty |
|---|---|---|---|---|
| 1 | google/gemini-3.1-pro-preview ($0.0872)::8ae02d489cb7 @ 2026-03-07 | 100.0 | 34 | 67.6 |
| 2 | anthropic/claude-sonnet-4.6 ($0.5111)::91715cc50e5e @ 2026-03-07 | 97.1 | 34 | 67.6 |
| 3 | z-ai/glm-5 ($0.0541)::17c57ee1cfa6 @ 2026-03-07 | 90.1 | 34 | 67.6 |
| 4 | gpt-5.4 ($0.0000)::0b2642b7b3b5 @ 2026-03-07 | 87.9 | 34 | 67.6 |
| 5 | moonshotai/kimi-k2.5 (recovered_after_fix) ($0.0558)::5476e97ed2c8 @ 2026-03-07 | 80.0 | 34 | 67.6 |
| 6 | gpt-5.3-codex ($0.0753)::880993f40176 @ 2026-03-07 | 74.2 | 34 | 67.6 |
| 7 | gpt-5.3-codex ($0.0000)::82d721235cc3 @ 2026-02-27 | 72.3 | 12 | 110.9 |
| 8 | gpt-5.2-codex ($0.0507)::aef8969aacc7 @ 2026-03-07 | 59.9 | 34 | 67.6 |
| 9 | gpt-5.2 (recovered_after_fix) ($0.1364)::2efbb468d8e4 @ 2026-03-07 | 59.8 | 34 | 67.6 |
| 10 | gpt-5.2 ($0.0667)::47eb5fc99f6f @ 2026-02-27 | 59.2 | 12 | 110.9 |
| 11 | gpt-5-mini ($0.0103)::67c9498f1701 @ 2026-02-27 | 58.5 | 12 | 110.9 |
| 12 | qwen/qwen3-max-thinking ($0.0644)::00e3223323da @ 2026-03-07 | 54.6 | 34 | 67.6 |
| 13 | qwen/qwen3.5-122b-a10b ($0.0466)::58a5ba6c9338 @ 2026-03-07 | 53.8 | 34 | 67.6 |
| 14 | gpt-5-mini ($0.0077)::3928741d8858 @ 2026-03-07 | 50.6 | 34 | 67.6 |
| 15 | stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::71211240e6e7 @ 2026-03-07 | 44.6 | 34 | 67.6 |
| 16 | gpt-5.2-codex ($0.0487)::0b500f1f8734 @ 2026-02-27 | 41.5 | 12 | 110.9 |
| 17 | stepfun/step-3.5-flash:free ($0.0000)::be86064bd9b6 @ 2026-02-27 | 40.5 | 12 | 110.9 |
| 18 | minimax/minimax-m2.5 (recovered_after_fix) ($0.0179)::7c939d8643c1 @ 2026-03-07 | 38.7 | 34 | 67.6 |
| 19 | deepseek/deepseek-v3.2 (recovered_after_fix) ($0.0083)::cd80f58124a8 @ 2026-03-07 | 31.8 | 34 | 67.6 |
| 20 | arcee-ai/trinity-large-preview:free ($0.0000)::ce841544258f @ 2026-02-27 | 23.0 | 12 | 110.9 |
| 21 | bytedance-seed/seed-2.0-mini ($0.0047)::1d511fe15598 @ 2026-03-07 | 18.9 | 34 | 67.6 |
| 22 | gpt-5-nano ($0.0041)::b5ef3d9318f0 @ 2026-02-27 | 12.9 | 12 | 110.9 |
| 23 | gpt-5-nano ($0.0049)::1a34fca062d0 @ 2026-03-07 | 4.9 | 34 | 67.6 |
| 24 | arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::0b87b7222640 @ 2026-03-07 | 0.3 | 34 | 67.6 |
| 25 | google/gemini-3.1-flash-lite-preview ($0.0040)::4d6f4419c790 @ 2026-03-07 | 0.0 | 34 | 67.6 |