DuelLab → Benchmark
Track: minimal_v1 / highest. DuelLab
| # | Entry | Score | Games played | Uncertainty |
|---|---|---|---|---|
| 1 | gpt-5.3-codex ($0.0000)::861682ece0ae @ 2026-02-27 | 100.0 | 8 | 133.3 |
| 2 | gpt-5-mini ($0.0232)::7ed20c1065d6 @ 2026-02-27 | 80.3 | 8 | 133.3 |
| 3 | gpt-5-nano ($0.0104)::d41b2f44dda7 @ 2026-02-27 | 57.2 | 8 | 133.3 |
| 4 | stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::4ab1bcc3e4b7 @ 2026-02-27 | 31.4 | 8 | 133.3 |
| 5 | arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::c0e35d0722f2 @ 2026-02-27 | 0.0 | 8 | 133.3 |