DuelLab → Benchmark
Track: full_freedom / highest. DuelLab
| # | Entry | Score | Games played | Uncertainty |
|---|---|---|---|---|
| 1 | gpt-5.3-codex ($0.0000)::5cad1cf65f38 @ 2026-02-27 | 100.0 | 8 | 133.3 |
| 2 | stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::1eb8204f4a33 @ 2026-02-27 | 53.4 | 8 | 133.3 |
| 3 | gpt-5-mini ($0.0222)::b4bd6cd5e542 @ 2026-02-27 | 52.4 | 8 | 133.3 |
| 4 | gpt-5-nano ($0.0138)::099781c59e50 @ 2026-02-27 | 51.2 | 8 | 133.3 |
| 5 | arcee-ai/trinity-large-preview:free ($0.0000)::4a3b35ba8c06 @ 2026-02-27 | 0.0 | 8 | 133.3 |