DuelLab → Benchmark
Rankings from code-generation tournaments on a hidden game suite. DuelLab
This track shows the average score across all six mode tracks; there is no per-game breakdown for Overall.
One row per model family; Min–Max is the score range across that family's dated entries in this track.
| Model family | Avg score | Min–Max | Entries |
|---|---|---|---|
| stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::416b6dabf6e1 | 100.0 | 100.0 | 1 |
| anthropic/claude-opus-4.6 ($0.0872)::38244ecbece9 | 100.0 | 100.0 | 1 |
| gpt-5.3-codex ($0.0000)::5cad1cf65f38 | 100.0 | 100.0 | 1 |
| gpt-5.3-codex ($0.0000)::861682ece0ae | 100.0 | 100.0 | 1 |
| google/gemini-3.1-pro-preview ($0.0872)::8ae02d489cb7 | 100.0 | 100.0 | 1 |
| qwen/qwen3-max-thinking ($0.0620)::17ebf1a0f415 | 100.0 | 100.0 | 1 |
| entrant_009_google--gemini-3-flash-preview::8c114917b8e3 | 100.0 | 100.0 | 1 |
| gpt-5-mini ($0.0088)::628ebfd2c9b8 | 100.0 | 100.0 | 1 |
| minimax/minimax-m2.5 ($0.0147)::37e6d2ed8e10 | 100.0 | 100.0 | 1 |
| gpt-5.2 ($0.0811)::2a48c6945db1 | 100.0 | 100.0 | 1 |
| deepseek/deepseek-v3.2 (recovered_after_fix) ($0.0064)::7530590d3a37 | 100.0 | 100.0 | 1 |
| minimax/minimax-m2.5 ($0.0150)::17d86923861b | 99.2 | 99.2 | 1 |
| qwen/qwen3-max-thinking ($0.0590)::4bd70e782dab | 99.2 | 99.2 | 1 |
| moonshotai/kimi-k2.5 ($0.0297)::611c884a8097 | 99.2 | 99.2 | 1 |
| moonshotai/kimi-k2.5 ($0.0075)::16a604294ab9 | 98.8 | 98.8 | 1 |
| entrant_009_google--gemini-3-flash-preview::6d1c5f498355 | 98.3 | 98.3 | 1 |
| qwen/qwen3-max-thinking ($0.0547)::244dbd3a5223 | 98.0 | 98.0 | 1 |
| google/gemini-3.1-pro-preview ($0.0600)::a7b8bff01755 | 97.2 | 97.2 | 1 |
| anthropic/claude-sonnet-4.6 ($0.5111)::91715cc50e5e | 97.1 | 97.1 | 1 |
| anthropic/claude-sonnet-4.6 ($0.0478)::92d03786e77c | 97.1 | 97.1 | 1 |
| qwen/qwen3-max-thinking (recovered_after_fix) ($0.0201)::4aeacca85750 | 96.9 | 96.9 | 1 |
| moonshotai/kimi-k2.5 ($0.0336)::0d7f59c95b3a | 95.9 | 95.9 | 1 |
| gpt-5-mini ($0.0200)::844f8cf45a4e | 95.7 | 95.7 | 1 |
| entrant_000_gpt-5.4::14ff6b6748de | 95.4 | 95.4 | 1 |
| gpt-5.2 ($0.0652)::3623da66d13f | 95.0 | 95.0 | 1 |
| stepfun/step-3.5-flash:free ($0.0000)::84575e982123 | 94.6 | 94.6 | 1 |
| anthropic/claude-sonnet-4.6 ($0.7898)::7e165f96dbae | 93.5 | 93.5 | 1 |
| gpt-5-mini ($0.0092)::1f8bd7336368 | 92.7 | 92.7 | 1 |
| entrant_007_z-ai--glm-5::b0dd13061084 | 92.7 | 92.7 | 1 |
| gpt-5.3-codex ($0.0000)::3d8ddcce263a | 91.2 | 91.2 | 1 |
| entrant_005_google--gemini-3.1-pro-preview::2e4d06f52910 | 91.0 | 91.0 | 1 |
| entrant_006_z-ai--glm-5::a2b617f85cfd | 90.9 | 90.9 | 1 |
| moonshotai/kimi-k2.5 ($0.0222)::4f4e1bffc0d6 | 90.7 | 90.7 | 1 |
| entrant_007_z-ai--glm-5::17c57ee1cfa6 | 90.7 | 90.7 | 1 |
| entrant_000_gpt-5.2::2a48c6945db1 | 90.5 | 90.5 | 1 |
| entrant_004_gpt-5.3-codex::861682ece0ae | 90.2 | 90.2 | 1 |
| z-ai/glm-5 ($0.0541)::17c57ee1cfa6 | 90.1 | 90.1 | 1 |
| qwen/qwen3-max-thinking ($0.0514)::0a63458392d1 | 89.6 | 89.6 | 1 |
| entrant_006_google--gemini-3.1-pro-preview | 89.6 | 89.6 | 1 |
| entrant_004_gpt-5.3-codex::9154767998bf | 89.5 | 89.5 | 1 |
| entrant_011_moonshotai--kimi-k2.5::25225f273b28 | 88.5 | 88.5 | 1 |
| gpt-5.3-codex ($0.0200)::2e94e75ca479 | 88.3 | 88.3 | 1 |
| entrant_012_moonshotai--kimi-k2.5::04fe201a22c6 | 88.2 | 88.2 | 1 |
| gpt-5.4 ($0.0000)::0b2642b7b3b5 | 87.9 | 87.9 | 1 |
| entrant_004_gpt-5.3-codex::c811aa176b62 | 87.4 | 87.4 | 1 |
| entrant_009_google--gemini-3-flash-preview::625ea5d044cd | 87.2 | 87.2 | 1 |
| entrant_000_gpt-5.4::4e789543bfc8 | 87.1 | 87.1 | 1 |
| entrant_004_gpt-5.3-codex::5980a9c19f87 | 87.1 | 87.1 | 1 |
| entrant_000_gpt-5.4::0b2642b7b3b5 | 86.8 | 86.8 | 1 |
| entrant_005_google--gemini-3.1-pro-preview::8ebe96e65980 | 86.6 | 86.6 | 1 |
| gpt-5.2-codex ($0.0446)::2252f948c0cf | 86.6 | 86.6 | 1 |
| stepfun/step-3.5-flash:free ($0.0000)::3dbf666dcbd0 | 86.6 | 86.6 | 1 |
| entrant_014_anthropic--claude-sonnet-4.6::01fce6ceed12 | 86.3 | 86.3 | 1 |
| z-ai/glm-5 ($0.0437)::00201bb03a01 | 86.2 | 86.2 | 1 |
| entrant_014_anthropic--claude-opus-4.6::b0f54ac64a0f | 85.3 | 85.3 | 1 |
| entrant_014_anthropic--claude-sonnet-4.6::0f0c803f0b38 | 85.0 | 85.0 | 1 |
| deepseek/deepseek-v3.2 ($0.0019)::ae5a9e1f9410 | 84.8 | 84.8 | 1 |
| entrant_006_z-ai--glm-5::4bae2d47b34c | 84.8 | 84.8 | 1 |
| gpt-5-mini ($0.0076)::058b46859b5d | 84.7 | 84.7 | 1 |
| entrant_004_gpt-5.3-codex::5cad1cf65f38 | 84.7 | 84.7 | 1 |
| qwen/qwen3-max-thinking ($0.0085)::ada2e8493ea1 | 84.6 | 84.6 | 1 |
| entrant_000_gpt-5.2::381b51bd0a04 | 84.4 | 84.4 | 1 |
| entrant_013_anthropic--claude-opus-4.6::38244ecbece9 | 84.0 | 84.0 | 1 |
| z-ai/glm-5 ($0.0371)::cb0020652f27 | 83.9 | 83.9 | 1 |
| entrant_013_anthropic--claude-opus-4.6::5c868b25a52f | 83.8 | 83.8 | 1 |
| entrant_011_moonshotai--kimi-k2.5::97459f7b08ce | 83.3 | 83.3 | 1 |
| entrant_013_anthropic--claude-opus-4.6::17c222e0ccd1 | 82.2 | 82.2 | 1 |
| entrant_015_anthropic--claude-sonnet-4.6::91715cc50e5e | 82.1 | 82.1 | 1 |
| gpt-5.2-codex ($0.0695)::00da108f1d3c | 80.9 | 80.9 | 1 |
| entrant_006_z-ai--glm-5::16b2342fb880 | 80.9 | 80.9 | 1 |
| gpt-5-mini ($0.0232)::7ed20c1065d6 | 80.3 | 80.3 | 1 |
| moonshotai/kimi-k2.5 (recovered_after_fix) ($0.0558)::5476e97ed2c8 | 80.0 | 80.0 | 1 |
| gpt-5.2 (recovered_after_fix) ($0.0915)::661c421e12a5 | 80.0 | 80.0 | 1 |
| entrant_014_anthropic--claude-opus-4.6::9a62dcbb8a3b | 79.6 | 79.6 | 1 |
| z-ai/glm-5 ($0.0481)::44808dece37d | 78.8 | 78.8 | 1 |
| entrant_014_anthropic--claude-sonnet-4.6::b8daeba4a7cf | 78.5 | 78.5 | 1 |
| entrant_014_anthropic--claude-sonnet-4.6::71d6f6447cbc | 78.5 | 78.5 | 1 |
| entrant_013_anthropic--claude-opus-4.6::6035e544b9be | 78.3 | 78.3 | 1 |
| entrant_013_anthropic--claude-opus-4.6::01029ef54314 | 78.3 | 78.3 | 1 |
| entrant_012_moonshotai--kimi-k2.5::5476e97ed2c8 | 77.8 | 77.8 | 1 |
| google/gemini-3.1-pro-preview ($0.0708)::066d0848caff | 77.7 | 77.7 | 1 |
| arcee-ai/trinity-large-preview:free ($0.0000)::29c62944fbd3 | 74.8 | 74.8 | 1 |
| gpt-5.2-codex ($0.4983)::ab71abbabbae | 74.4 | 74.4 | 1 |
| gpt-5.3-codex ($0.0753)::880993f40176 | 74.2 | 74.2 | 1 |
| entrant_015_anthropic--claude-sonnet-4.6::09e756834eda | 73.5 | 73.5 | 1 |
| anthropic/claude-opus-4.6 ($0.0833)::6ba3403d42aa | 73.4 | 73.4 | 1 |
| entrant_012_moonshotai--kimi-k2.5::0e4c4b3371ea | 72.7 | 72.7 | 1 |
| entrant_005_gpt-5.3-codex::880993f40176 | 72.6 | 72.6 | 1 |
| entrant_004_gpt-5.3-codex::3d8ddcce263a | 72.3 | 72.3 | 1 |
| gpt-5.3-codex ($0.0000)::82d721235cc3 | 72.3 | 72.3 | 1 |
| entrant_007_z-ai--glm-5::60475863ae30 | 71.3 | 71.3 | 1 |
| qwen/qwen3.5-122b-a10b ($0.0207)::1023d7d1ecf9 | 71.1 | 71.1 | 1 |
| entrant_004_gpt-5.3-codex::26c58495c5e9 | 70.1 | 70.1 | 1 |
| anthropic/claude-opus-4.6 ($0.8473)::17c222e0ccd1 | 69.3 | 69.3 | 1 |
| entrant_005_google--gemini-3.1-pro-preview::53822cf06dda | 69.2 | 69.2 | 1 |
| entrant_006_z-ai--glm-5::24b8f171bca6 | 68.8 | 68.8 | 1 |
| entrant_000_gpt-5.2::34ff3d41d915 | 68.6 | 68.6 | 1 |
| gpt-5-mini ($0.0175)::2af654aceacc | 66.9 | 66.9 | 1 |
| entrant_007_qwen--qwen3-max-thinking::f438fd18faee | 66.5 | 66.5 | 1 |
| entrant_009_google--gemini-3-flash-preview::2db816dc429f | 66.0 | 66.0 | 1 |
| gpt-5-nano ($0.0103)::21d869229d89 | 65.3 | 65.3 | 1 |
| entrant_000_gpt-5.4::18fd032f43d1 | 64.6 | 64.6 | 1 |
| gpt-5.3-codex ($0.0617)::15ca78810d8f | 63.9 | 63.9 | 1 |
| gpt-5.3-codex ($0.4748)::1399bc429a50 | 62.1 | 62.1 | 1 |
| anthropic/claude-opus-4.6 ($0.0846)::9a62dcbb8a3b | 61.7 | 61.7 | 1 |
| gpt-5.4 ($0.0000)::14ff6b6748de | 61.7 | 61.7 | 1 |
| moonshotai/kimi-k2.5 ($0.0325)::75c2cc06f5f9 | 61.5 | 61.5 | 1 |
| anthropic/claude-opus-4.6 ($0.0875)::b0f54ac64a0f | 60.9 | 60.9 | 1 |
| gpt-5.2-codex ($0.0507)::aef8969aacc7 | 59.9 | 59.9 | 1 |
| gpt-5.2 (recovered_after_fix) ($0.1364)::2efbb468d8e4 | 59.8 | 59.8 | 1 |
| z-ai/glm-5 ($0.0116)::b0dd13061084 | 59.7 | 59.7 | 1 |
| anthropic/claude-sonnet-4.6 ($0.0579)::09e756834eda | 59.5 | 59.5 | 1 |
| gpt-5.3-codex (recovered_after_fix) ($0.5605)::e954ca523560 | 59.5 | 59.5 | 1 |
| gpt-5.2 ($0.0667)::47eb5fc99f6f | 59.2 | 59.2 | 1 |
| moonshotai/kimi-k2.5 (recovered_after_fix) ($0.0209)::04fe201a22c6 | 59.2 | 59.2 | 1 |
| z-ai/glm-5 ($0.0093)::cb5aa20bd106 | 58.9 | 58.9 | 1 |
| google/gemini-3.1-pro-preview ($0.3446)::37db7ffea127 | 58.9 | 58.9 | 1 |
| entrant_005_gpt-5.3-codex::665c73d58b45 | 58.8 | 58.8 | 1 |
| entrant_004_gpt-5.3-codex::82d721235cc3 | 58.8 | 58.8 | 1 |
| entrant_003_gpt-5.2-codex::00da108f1d3c | 58.8 | 58.8 | 1 |
| gpt-5-mini ($0.0103)::67c9498f1701 | 58.5 | 58.5 | 1 |
| entrant_000_gpt-5.2::7d23fd327111 | 58.5 | 58.5 | 1 |
| entrant_001_gpt-5.2::9e6a8333c618 | 58.3 | 58.3 | 1 |
| entrant_004_gpt-5.3-codex::60fc1c213ad8 | 58.2 | 58.2 | 1 |
| qwen/qwen3.5-122b-a10b ($0.0250)::3a876f4663d4 | 58.2 | 58.2 | 1 |
| entrant_000_gpt-5.2::5e3db3fbd34c | 58.1 | 58.1 | 1 |
| entrant_001_gpt-5.2::a02047939ea9 | 58.0 | 58.0 | 1 |
| entrant_004_gpt-5.2-codex | 57.4 | 57.4 | 1 |
| entrant_019_bytedance-seed--seed-2.0-mini::513ee872c075 | 57.2 | 57.2 | 1 |
| gpt-5-nano ($0.0104)::d41b2f44dda7 | 57.2 | 57.2 | 1 |
| entrant_005_gpt-5.3-codex::057f7457c82d | 57.1 | 57.1 | 1 |
| entrant_001_gpt-5-mini::b4bd6cd5e542 | 56.7 | 56.7 | 1 |
| entrant_001_gpt-5.2::2efbb468d8e4 | 56.6 | 56.6 | 1 |
| entrant_008_qwen--qwen3.5-122b-a10b::11c15e22c8e8 | 56.6 | 56.6 | 1 |
| entrant_009_qwen--qwen3.5-122b-a10b::58a5ba6c9338 | 56.5 | 56.5 | 1 |
| entrant_004_gpt-5.3-codex::5ca5a945609f | 56.4 | 56.4 | 1 |
| entrant_008_qwen--qwen3-max-thinking::00e3223323da | 56.3 | 56.3 | 1 |
| stepfun/step-3.5-flash:free ($0.0000)::b4370bd94d70 | 56.2 | 56.2 | 1 |
| entrant_017_stepfun--step-3.5-flash_free | 56.1 | 56.1 | 1 |
| anthropic/claude-sonnet-4.6 ($0.0420)::a0d3ca1ae9ad | 56.1 | 56.1 | 1 |
| entrant_016_stepfun--step-3.5-flash_free::1eb8204f4a33 | 56.0 | 56.0 | 1 |
| entrant_003_gpt-5.2-codex::5f0a5b2529a5 | 55.9 | 55.9 | 1 |
| entrant_000_gpt-5.2::761693f14061 | 55.7 | 55.7 | 1 |
| entrant_001_gpt-5-mini::87821c3c85b1 | 55.7 | 55.7 | 1 |
| z-ai/glm-5 ($0.0108)::5f72c0eb881c | 55.5 | 55.5 | 1 |
| moonshotai/kimi-k2.5 ($0.0046)::0e4c4b3371ea | 55.1 | 55.1 | 1 |
| qwen/qwen3.5-122b-a10b (recovered_after_fix) ($0.0099)::9237962e52ca | 54.9 | 54.9 | 1 |
| qwen/qwen3-max-thinking ($0.0644)::00e3223323da | 54.6 | 54.6 | 1 |
| entrant_009_google--gemini-3-flash-preview::6ac5fff628cd | 54.5 | 54.5 | 1 |
| entrant_001_gpt-5-mini::7ed20c1065d6 | 54.3 | 54.3 | 1 |
| entrant_001_gpt-5-mini::ad1d783a4c70 | 54.3 | 54.3 | 1 |
| entrant_001_gpt-5-mini::8f38c4d9855c | 54.2 | 54.2 | 1 |
| entrant_002_gpt-5-mini | 54.0 | 54.0 | 1 |
| entrant_002_gpt-5-nano::4e47419a7589 | 53.9 | 53.9 | 1 |
| entrant_001_gpt-5-mini::2822279cbf1a | 53.8 | 53.8 | 1 |
| qwen/qwen3.5-122b-a10b ($0.0466)::58a5ba6c9338 | 53.8 | 53.8 | 1 |
| entrant_004_gpt-5.3-codex::1cbbac7a4039 | 53.7 | 53.7 | 1 |
| stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::1eb8204f4a33 | 53.4 | 53.4 | 1 |
| entrant_001_gpt-5-mini::0c8fab332113 | 53.1 | 53.1 | 1 |
| z-ai/glm-5 ($0.0102)::60475863ae30 | 52.9 | 52.9 | 1 |
| gpt-5.3-codex ($0.0266)::86838dd03471 | 52.7 | 52.7 | 1 |
| entrant_002_gpt-5-nano::099781c59e50 | 52.7 | 52.7 | 1 |
| qwen/qwen3.5-122b-a10b ($0.0434)::71dca6c97f92 | 52.5 | 52.5 | 1 |
| entrant_008_qwen--qwen3.5-122b-a10b::0241c2460b90 | 52.5 | 52.5 | 1 |
| gpt-5-mini ($0.0222)::b4bd6cd5e542 | 52.4 | 52.4 | 1 |
| bytedance-seed/seed-2.0-mini ($0.0010)::513ee872c075 | 51.8 | 51.8 | 1 |
| google/gemini-3.1-pro-preview (recovered_after_fix) ($0.3999)::5540d6ab37a8 | 51.5 | 51.5 | 1 |
| gpt-5-nano ($0.0138)::099781c59e50 | 51.2 | 51.2 | 1 |
| gpt-5.3-codex ($0.0259)::665c73d58b45 | 51.1 | 51.1 | 1 |
| gpt-5.4 ($0.0000)::18fd032f43d1 | 51.0 | 51.0 | 1 |
| google/gemini-3.1-flash-lite-preview ($0.0032)::b0ae954bb34a | 51.0 | 51.0 | 1 |
| gpt-5-mini ($0.0077)::3928741d8858 | 50.6 | 50.6 | 1 |
| entrant_001_gpt-5-mini::67c9498f1701 | 50.2 | 50.2 | 1 |
| entrant_011_moonshotai--kimi-k2.5::6b9555b535cf | 49.8 | 49.8 | 1 |
| entrant_016_stepfun--step-3.5-flash_free::2aa14e16a463 | 49.6 | 49.6 | 1 |
| entrant_001_gpt-5-mini::5bf4759e7c02 | 49.4 | 49.4 | 1 |
| gpt-5.3-codex ($0.0000)::26c58495c5e9 | 48.8 | 48.8 | 1 |
| entrant_006_z-ai--glm-5::432a47fdb873 | 47.9 | 47.9 | 1 |
| arcee-ai/trinity-large-preview:free ($0.0000)::e5c9c34f4cf9 | 47.8 | 47.8 | 1 |
| gpt-5.2 ($0.0430)::39826a082fb0 | 47.3 | 47.3 | 1 |
| entrant_011_minimax--minimax-m2.5 | 47.3 | 47.3 | 1 |
| gpt-5.2 ($0.0432)::a02047939ea9 | 47.3 | 47.3 | 1 |
| gpt-5.3-codex ($0.0325)::057f7457c82d | 47.2 | 47.2 | 1 |
| gpt-5.2 ($0.0319)::9e6a8333c618 | 47.1 | 47.1 | 1 |
| gpt-5-mini ($0.0087)::87821c3c85b1 | 46.5 | 46.5 | 1 |
| anthropic/claude-sonnet-4.6 ($0.3961)::74e8f80b29ee | 46.1 | 46.1 | 1 |
| gpt-5.2 ($0.0396)::761693f14061 | 45.7 | 45.7 | 1 |
| gpt-5.3-codex ($0.0000)::1cbbac7a4039 | 45.6 | 45.6 | 1 |
| entrant_011_moonshotai--kimi-k2.5::86974ddeead2 | 45.6 | 45.6 | 1 |
| google/gemini-3.1-flash-lite-preview ($0.0027)::745448837948 | 45.4 | 45.4 | 1 |
| entrant_000_gpt-5.2::47eb5fc99f6f | 45.3 | 45.3 | 1 |
| gpt-5-nano ($0.0025)::34dfa9a03ec2 | 45.3 | 45.3 | 1 |
| entrant_015_anthropic--claude-sonnet-4.6::8eeefac1ec17 | 44.9 | 44.9 | 1 |
| stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::71211240e6e7 | 44.6 | 44.6 | 1 |
| anthropic/claude-sonnet-4.6 ($0.0648)::8eeefac1ec17 | 44.5 | 44.5 | 1 |
| gpt-5.2 ($0.0300)::791483e95653 | 44.4 | 44.4 | 1 |
| bytedance-seed/seed-2.0-mini ($0.0063)::284413223bc7 | 44.4 | 44.4 | 1 |
| entrant_000_gpt-5.2::39826a082fb0 | 43.8 | 43.8 | 1 |
| qwen/qwen3.5-122b-a10b (recovered_after_fix) ($0.0132)::b1f8ca87ed0a | 43.4 | 43.4 | 1 |
| gpt-5-mini ($0.0090)::5bf4759e7c02 | 43.0 | 43.0 | 1 |
| stepfun/step-3.5-flash:free ($0.0000)::2aa14e16a463 | 42.9 | 42.9 | 1 |
| gpt-5.3-codex ($0.0544)::7b791c451590 | 42.6 | 42.6 | 1 |
| entrant_000_gpt-5.4::7e297e7b9118 | 42.2 | 42.2 | 1 |
| entrant_012_deepseek--deepseek-v3.2::babb7f633345 | 41.9 | 41.9 | 1 |
| google/gemini-3.1-flash-lite-preview ($0.0125)::c096dda29618 | 41.7 | 41.7 | 1 |
| gpt-5.2-codex ($0.0487)::0b500f1f8734 | 41.5 | 41.5 | 1 |
| deepseek/deepseek-v3.2 (recovered_after_fix) ($0.0038)::71a3315ecc07 | 41.0 | 41.0 | 1 |
| entrant_010_google--gemini-3.1-flash-lite-preview::745448837948 | 40.6 | 40.6 | 1 |
| entrant_013_deepseek--deepseek-v3.2::cd80f58124a8 | 40.5 | 40.5 | 1 |
| stepfun/step-3.5-flash:free ($0.0000)::be86064bd9b6 | 40.5 | 40.5 | 1 |
| arcee-ai/trinity-large-preview:free ($0.0000)::16bb68f624ee | 39.9 | 39.9 | 1 |
| anthropic/claude-sonnet-4.6 ($0.3750)::263e91e37c96 | 39.5 | 39.5 | 1 |
| entrant_013_deepseek--deepseek-v3.2::708fac99e5dc | 39.5 | 39.5 | 1 |
| entrant_016_arcee-ai--trinity-large-preview_free::b15c0f016557 | 39.2 | 39.2 | 1 |
| minimax/minimax-m2.5 ($0.0051)::06bd7cb68806 | 39.0 | 39.0 | 1 |
| minimax/minimax-m2.5 (recovered_after_fix) ($0.0179)::7c939d8643c1 | 38.7 | 38.7 | 1 |
| deepseek/deepseek-v3.2 ($0.0018)::708fac99e5dc | 38.7 | 38.7 | 1 |
| arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::b15c0f016557 | 38.4 | 38.4 | 1 |
| entrant_007_qwen--qwen3-max-thinking::45cf191da6b2 | 38.3 | 38.3 | 1 |
| gpt-5-nano ($0.0076)::03769244b16e | 37.7 | 37.7 | 1 |
| entrant_012_deepseek--deepseek-v3.2::b09f4a5411ae | 37.7 | 37.7 | 1 |
| google/gemini-3.1-flash-lite-preview ($0.0020)::5ed71f0ce79a | 37.4 | 37.4 | 1 |
| gpt-5-mini ($0.0110)::8581fe62e905 | 37.4 | 37.4 | 1 |
| entrant_001_gpt-5-mini::9643aa170276 | 37.1 | 37.1 | 1 |
| arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::42448db4449b | 36.7 | 36.7 | 1 |
| entrant_008_qwen--qwen3.5-122b-a10b::af4bb1a03d77 | 36.6 | 36.6 | 1 |
| entrant_001_gpt-5-mini::048e9bf281bb | 36.5 | 36.5 | 1 |
| entrant_002_gpt-5-nano::9d755956a0f6 | 36.3 | 36.3 | 1 |
| qwen/qwen3-max-thinking (recovered_after_fix) ($0.0199)::83206da24217 | 36.3 | 36.3 | 1 |
| gpt-5-nano ($0.0070)::3b80bc411288 | 36.0 | 36.0 | 1 |
| entrant_010_minimax--minimax-m2.5::e7794d25f07b | 35.4 | 35.4 | 1 |
| gpt-5-nano ($0.0055)::62315ee296bc | 35.4 | 35.4 | 1 |
| deepseek/deepseek-v3.2 ($0.0018)::0638cde804dc | 35.1 | 35.1 | 1 |
| entrant_019_bytedance-seed--seed-2.0-mini::1d511fe15598 | 35.0 | 35.0 | 1 |
| entrant_015_arcee-ai--trinity-large-preview_free::09894b1bd9ea | 34.5 | 34.5 | 1 |
| google/gemini-3.1-flash-lite-preview ($0.0169)::652b4056c583 | 34.4 | 34.4 | 1 |
| qwen/qwen3.5-122b-a10b ($0.0038)::2b25ee71d64d | 33.8 | 33.8 | 1 |
| entrant_019_bytedance-seed--seed-2.0-mini::20eb0e240e4a | 33.5 | 33.5 | 1 |
| entrant_002_gpt-5-nano::03769244b16e | 32.3 | 32.3 | 1 |
| qwen/qwen3-max-thinking ($0.0086)::99446e67ec0f | 31.9 | 31.9 | 1 |
| deepseek/deepseek-v3.2 (recovered_after_fix) ($0.0083)::cd80f58124a8 | 31.8 | 31.8 | 1 |
| anthropic/claude-opus-4.6 ($0.7125)::01029ef54314 | 31.6 | 31.6 | 1 |
| stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::4ab1bcc3e4b7 | 31.4 | 31.4 | 1 |
| deepseek/deepseek-v3.2 ($0.0032)::7b6db8a35def | 31.1 | 31.1 | 1 |
| entrant_002_gpt-5-nano::edc6e99823b9 | 30.9 | 30.9 | 1 |
| bytedance-seed/seed-2.0-mini ($0.0009)::10023bce516e | 30.7 | 30.7 | 1 |
| anthropic/claude-sonnet-4.6 ($0.6293)::1c1d04ac560e | 30.6 | 30.6 | 1 |
| entrant_016_stepfun--step-3.5-flash_free::be86064bd9b6 | 29.1 | 29.1 | 1 |
| entrant_007_qwen--qwen3-max-thinking::fe1be3eb2268 | 28.7 | 28.7 | 1 |
| deepseek/deepseek-v3.2 ($0.0033)::301ceb9d61df | 28.7 | 28.7 | 1 |
| deepseek/deepseek-v3.2 ($0.0114)::af7298d9a915 | 28.7 | 28.7 | 1 |
| entrant_013_deepseek--deepseek-v3.2::0638cde804dc | 28.4 | 28.4 | 1 |
| entrant_015_arcee-ai--trinity-large-preview_free::16bb68f624ee | 28.2 | 28.2 | 1 |
| entrant_007_qwen--qwen3-max-thinking::352e53cd1449 | 28.1 | 28.1 | 1 |
| entrant_007_qwen--qwen3-max-thinking::44e3d89d6410 | 28.1 | 28.1 | 1 |
| moonshotai/kimi-k2.5 ($0.0088)::3417d570adb7 | 27.6 | 27.6 | 1 |
| entrant_015_arcee-ai--trinity-large-preview_free::4a3b35ba8c06 | 27.5 | 27.5 | 1 |
| qwen/qwen3.5-122b-a10b ($0.0646)::43c91e963cbe | 27.3 | 27.3 | 1 |
| entrant_002_gpt-5-nano::d41b2f44dda7 | 26.4 | 26.4 | 1 |
| z-ai/glm-5 ($0.0443)::2490d4ff540f | 26.3 | 26.3 | 1 |
| entrant_003_gpt-5.2-codex::557237351b91 | 26.1 | 26.1 | 1 |
| entrant_016_stepfun--step-3.5-flash_free::c36f05dc9ad2 | 25.8 | 25.8 | 1 |
| minimax/minimax-m2.5 ($0.0130)::33656ecfc86a | 25.5 | 25.5 | 1 |
| gpt-5-nano ($0.0032)::b71e9163bf77 | 25.0 | 25.0 | 1 |
| entrant_015_arcee-ai--trinity-large-preview_free::0ace044aeb44 | 24.8 | 24.8 | 1 |
| entrant_002_gpt-5-nano::168b4641c9d2 | 24.7 | 24.7 | 1 |
| google/gemini-3.1-flash-lite-preview (recovered_after_fix) ($0.0079)::5fa1aa40c3fd | 24.5 | 24.5 | 1 |
| entrant_003_gpt-5.2-codex::0b500f1f8734 | 24.5 | 24.5 | 1 |
| entrant_016_stepfun--step-3.5-flash_free::4ab1bcc3e4b7 | 24.3 | 24.3 | 1 |
| entrant_016_arcee-ai--trinity-large-preview_free::42448db4449b | 24.3 | 24.3 | 1 |
| entrant_016_arcee-ai--trinity-large-preview_free::0b87b7222640 | 24.1 | 24.1 | 1 |
| entrant_008_qwen--qwen3.5-122b-a10b::4dfac77a88dd | 23.3 | 23.3 | 1 |
| google/gemini-3.1-flash-lite-preview ($0.0044)::2372e9571823 | 23.3 | 23.3 | 1 |
| arcee-ai/trinity-large-preview:free ($0.0000)::ce841544258f | 23.0 | 23.0 | 1 |
| gpt-5-mini ($0.0097)::048e9bf281bb | 22.9 | 22.9 | 1 |
| arcee-ai/trinity-large-preview:free ($0.0000)::1b493558fdb1 | 22.7 | 22.7 | 1 |
| entrant_015_arcee-ai--trinity-large-preview_free::ce841544258f | 20.4 | 20.4 | 1 |
| bytedance-seed/seed-2.0-mini (recovered_after_fix) ($0.0050)::1091b348e996 | 20.4 | 20.4 | 1 |
| entrant_009_qwen--qwen3.5-122b-a10b::b1f8ca87ed0a | 20.3 | 20.3 | 1 |
| entrant_015_arcee-ai--trinity-large-preview_free::545a42bbbd09 | 18.9 | 18.9 | 1 |
| bytedance-seed/seed-2.0-mini ($0.0047)::1d511fe15598 | 18.9 | 18.9 | 1 |
| entrant_010_google--gemini-3.1-flash-lite-preview::4d6f4419c790 | 17.6 | 17.6 | 1 |
| gpt-5-nano ($0.0058)::edc6e99823b9 | 17.3 | 17.3 | 1 |
| entrant_015_arcee-ai--trinity-large-preview_free::c0e35d0722f2 | 17.1 | 17.1 | 1 |
| entrant_010_minimax--minimax-m2.5::856d0f4c9892 | 16.9 | 16.9 | 1 |
| entrant_010_google--gemini-3.1-flash-lite-preview::5ed71f0ce79a | 16.2 | 16.2 | 1 |
| arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::1b9e3f0b2b30 | 14.2 | 14.2 | 1 |
| gpt-5-nano ($0.0041)::b5ef3d9318f0 | 12.9 | 12.9 | 1 |
| arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::b0e21c8cc606 | 12.1 | 12.1 | 1 |
| entrant_008_qwen--qwen3-max-thinking::99446e67ec0f | 12.0 | 12.0 | 1 |
| entrant_002_gpt-5-nano::b5ef3d9318f0 | 11.9 | 11.9 | 1 |
| entrant_003_gpt-5-nano | 10.9 | 10.9 | 1 |
| entrant_015_arcee-ai--trinity-large-preview_free::0f8a48b690b6 | 10.7 | 10.7 | 1 |
| entrant_010_minimax--minimax-m2.5::80374b7181ce | 9.9 | 9.9 | 1 |
| entrant_008_qwen--qwen3-max-thinking::83206da24217 | 9.6 | 9.6 | 1 |
| entrant_002_gpt-5-nano::04639b45a655 | 9.2 | 9.2 | 1 |
| entrant_008_qwen--qwen3.5-122b-a10b::547f7c89c067 | 7.6 | 7.6 | 1 |
| entrant_012_deepseek--deepseek-v3.2::1516bc091028 | 6.0 | 6.0 | 1 |
| entrant_009_qwen--qwen3.5-122b-a10b::2b25ee71d64d | 5.7 | 5.7 | 1 |
| entrant_002_gpt-5-nano::3b80bc411288 | 4.9 | 4.9 | 1 |
| gpt-5-nano ($0.0049)::1a34fca062d0 | 4.9 | 4.9 | 1 |
| gpt-5.2-codex ($0.0275)::124e05529c56 | 4.8 | 4.8 | 1 |
| entrant_016_stepfun--step-3.5-flash_free::57027fa97bfc | 4.1 | 4.1 | 1 |
| entrant_002_gpt-5-nano::681d5465556b | 3.8 | 3.8 | 1 |
| entrant_013_anthropic--claude-opus-4.6::6ba3403d42aa | 3.4 | 3.4 | 1 |
| bytedance-seed/seed-2.0-mini ($0.0062)::9c565cec5a53 | 2.9 | 2.9 | 1 |
| entrant_015_arcee-ai--trinity-large-preview_free::9a5c7e5c7b07 | 1.0 | 1.0 | 1 |
| arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::0b87b7222640 | 0.3 | 0.3 | 1 |
| gpt-5-nano (recovered_after_fix) ($0.0065)::7b7318670453 | 0.0 | 0.0 | 1 |
| arcee-ai/trinity-large-preview:free ($0.0000)::545a42bbbd09 | 0.0 | 0.0 | 1 |
| google/gemini-3.1-flash-lite-preview ($0.0023)::1be8da66db78 | 0.0 | 0.0 | 1 |
| google/gemini-3.1-flash-lite-preview ($0.0040)::4d6f4419c790 | 0.0 | 0.0 | 1 |
| gpt-5-nano ($0.0031)::a37024d8b02c | 0.0 | 0.0 | 1 |
| qwen/qwen3.5-122b-a10b (recovered_after_fix) ($0.0115)::140e17a0d40b | 0.0 | 0.0 | 1 |
| arcee-ai/trinity-large-preview:free ($0.0000)::4a3b35ba8c06 | 0.0 | 0.0 | 1 |
| arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::c0e35d0722f2 | 0.0 | 0.0 | 1 |
| arcee-ai/trinity-large-preview:free ($0.0000)::682f10efa6e9 | 0.0 | 0.0 | 1 |
| # | Entry | Overall score | Coverage | Games played | Uncertainty (avg) |
|---|---|---|---|---|---|
| 1 | gpt-5.3-codex ($0.0000)::5cad1cf65f38 @ 2026-02-27 | 100.0 | under_tested | 8 | 133.3 |
| 2 | google/gemini-3.1-pro-preview ($0.0872)::8ae02d489cb7 @ 2026-03-07 | 100.0 | provisional | 34 | 67.6 |
| 3 | anthropic/claude-opus-4.6 ($0.0872)::38244ecbece9 @ 2026-03-04 | 100.0 | under_tested | 19 | 89.4 |
| 4 | gpt-5.3-codex ($0.0000)::861682ece0ae @ 2026-02-27 | 100.0 | under_tested | 8 | 133.3 |
| 5 | qwen/qwen3-max-thinking ($0.0620)::17ebf1a0f415 @ 2026-03-04 | 100.0 | under_tested | 24 | 80.0 |
| 6 | entrant_009_google--gemini-3-flash-preview::8c114917b8e3 @ 2026-03-07 | 100.0 | stable | 283 | 23.8 |
| 7 | deepseek/deepseek-v3.2 (recovered_after_fix) ($0.0064)::7530590d3a37 @ 2026-03-04 | 100.0 | under_tested | 27 | 75.6 |
| 8 | stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::416b6dabf6e1 @ 2026-03-04 | 100.0 | under_tested | 29 | 73.0 |
| 9 | gpt-5-mini ($0.0088)::628ebfd2c9b8 @ 2026-03-04 | 100.0 | under_tested | 21 | 85.3 |
| 10 | gpt-5.2 ($0.0811)::2a48c6945db1 @ 2026-02-27 | 100.0 | under_tested | 12 | 110.9 |
| 11 | minimax/minimax-m2.5 ($0.0147)::37e6d2ed8e10 @ 2026-03-04 | 100.0 | under_tested | 19 | 89.4 |
| 12 | minimax/minimax-m2.5 ($0.0150)::17d86923861b @ 2026-03-04 | 99.2 | under_tested | 24 | 80.0 |
| 13 | qwen/qwen3-max-thinking ($0.0590)::4bd70e782dab @ 2026-03-04 | 99.2 | provisional | 30 | 71.8 |
| 14 | moonshotai/kimi-k2.5 ($0.0297)::611c884a8097 @ 2026-03-04 | 99.2 | under_tested | 28 | 74.3 |
| 15 | moonshotai/kimi-k2.5 ($0.0075)::16a604294ab9 @ 2026-03-04 | 98.8 | under_tested | 19 | 89.4 |
| 16 | entrant_009_google--gemini-3-flash-preview::6d1c5f498355 @ 2026-03-07 | 98.3 | stable | 284 | 23.8 |
| 17 | qwen/qwen3-max-thinking ($0.0547)::244dbd3a5223 @ 2026-03-04 | 98.0 | under_tested | 25 | 78.4 |
| 18 | google/gemini-3.1-pro-preview ($0.0600)::a7b8bff01755 @ 2026-03-04 | 97.2 | under_tested | 29 | 73.0 |
| 19 | anthropic/claude-sonnet-4.6 ($0.5111)::91715cc50e5e @ 2026-03-07 | 97.1 | provisional | 34 | 67.6 |
| 20 | anthropic/claude-sonnet-4.6 ($0.0478)::92d03786e77c @ 2026-03-04 | 97.1 | under_tested | 21 | 85.3 |
| 21 | qwen/qwen3-max-thinking (recovered_after_fix) ($0.0201)::4aeacca85750 @ 2026-03-04 | 96.9 | under_tested | 21 | 85.3 |
| 22 | moonshotai/kimi-k2.5 ($0.0336)::0d7f59c95b3a @ 2026-03-04 | 95.9 | under_tested | 24 | 80.0 |
| 23 | gpt-5-mini ($0.0200)::844f8cf45a4e @ 2026-03-04 | 95.7 | under_tested | 18 | 91.8 |
| 24 | entrant_000_gpt-5.4::14ff6b6748de @ 2026-03-07 | 95.4 | stable | 284 | 23.8 |
| 25 | gpt-5.2 ($0.0652)::3623da66d13f @ 2026-03-04 | 95.0 | under_tested | 25 | 78.4 |
| 26 | stepfun/step-3.5-flash:free ($0.0000)::84575e982123 @ 2026-03-04 | 94.6 | under_tested | 24 | 80.0 |
| 27 | anthropic/claude-sonnet-4.6 ($0.7898)::7e165f96dbae @ 2026-03-04 | 93.5 | under_tested | 23 | 81.6 |
| 28 | gpt-5-mini ($0.0092)::1f8bd7336368 @ 2026-03-04 | 92.7 | under_tested | 29 | 73.0 |
| 29 | entrant_007_z-ai--glm-5::b0dd13061084 @ 2026-03-07 | 92.7 | stable | 284 | 23.8 |
| 30 | gpt-5.3-codex ($0.0000)::3d8ddcce263a @ 2026-02-27 | 91.2 | under_tested | 12 | 110.9 |
| 31 | entrant_005_google--gemini-3.1-pro-preview::2e4d06f52910 @ 2026-03-07 | 91.0 | stable | 283 | 23.8 |
| 32 | entrant_006_z-ai--glm-5::a2b617f85cfd @ 2026-03-07 | 90.9 | stable | 284 | 23.8 |
| 33 | moonshotai/kimi-k2.5 ($0.0222)::4f4e1bffc0d6 @ 2026-03-04 | 90.7 | under_tested | 22 | 83.4 |
| 34 | entrant_007_z-ai--glm-5::17c57ee1cfa6 @ 2026-03-07 | 90.7 | stable | 284 | 23.8 |
| 35 | entrant_000_gpt-5.2::2a48c6945db1 @ 2026-03-07 | 90.5 | stable | 284 | 23.8 |
| 36 | entrant_004_gpt-5.3-codex::861682ece0ae @ 2026-03-07 | 90.2 | stable | 284 | 23.8 |
| 37 | z-ai/glm-5 ($0.0541)::17c57ee1cfa6 @ 2026-03-07 | 90.1 | provisional | 34 | 67.6 |
| 38 | qwen/qwen3-max-thinking ($0.0514)::0a63458392d1 @ 2026-03-04 | 89.6 | under_tested | 26 | 77.0 |
| 39 | entrant_006_google--gemini-3.1-pro-preview @ 2026-03-07 | 89.6 | stable | 283 | 23.8 |
| 40 | entrant_004_gpt-5.3-codex::9154767998bf @ 2026-03-07 | 89.5 | stable | 283 | 23.8 |
| 41 | entrant_011_moonshotai--kimi-k2.5::25225f273b28 @ 2026-03-07 | 88.5 | stable | 284 | 23.8 |
| 42 | gpt-5.3-codex ($0.0200)::2e94e75ca479 @ 2026-03-04 | 88.3 | under_tested | 22 | 83.4 |
| 43 | entrant_012_moonshotai--kimi-k2.5::04fe201a22c6 @ 2026-03-07 | 88.2 | stable | 284 | 23.8 |
| 44 | gpt-5.4 ($0.0000)::0b2642b7b3b5 @ 2026-03-07 | 87.9 | provisional | 34 | 67.6 |
| 45 | entrant_004_gpt-5.3-codex::c811aa176b62 @ 2026-03-07 | 87.4 | stable | 283 | 23.8 |
| 46 | entrant_009_google--gemini-3-flash-preview::625ea5d044cd @ 2026-03-07 | 87.2 | stable | 284 | 23.8 |
| 47 | entrant_000_gpt-5.4::4e789543bfc8 @ 2026-03-07 | 87.1 | stable | 283 | 23.7 |
| 48 | entrant_004_gpt-5.3-codex::5980a9c19f87 @ 2026-03-07 | 87.1 | stable | 283 | 23.8 |
| 49 | entrant_000_gpt-5.4::0b2642b7b3b5 @ 2026-03-07 | 86.8 | stable | 284 | 23.8 |
| 50 | entrant_005_google--gemini-3.1-pro-preview::8ebe96e65980 @ 2026-03-07 | 86.6 | stable | 284 | 23.8 |
| 51 | gpt-5.2-codex ($0.0446)::2252f948c0cf @ 2026-03-04 | 86.6 | provisional | 30 | 71.8 |
| 52 | stepfun/step-3.5-flash:free ($0.0000)::3dbf666dcbd0 @ 2026-03-04 | 86.6 | under_tested | 27 | 75.6 |
| 53 | entrant_014_anthropic--claude-sonnet-4.6::01fce6ceed12 @ 2026-03-07 | 86.3 | stable | 284 | 23.8 |
| 54 | z-ai/glm-5 ($0.0437)::00201bb03a01 @ 2026-03-04 | 86.2 | under_tested | 24 | 80.0 |
| 55 | entrant_014_anthropic--claude-opus-4.6::b0f54ac64a0f @ 2026-03-07 | 85.3 | stable | 284 | 23.8 |
| 56 | entrant_014_anthropic--claude-sonnet-4.6::0f0c803f0b38 @ 2026-03-07 | 85.0 | stable | 283 | 23.8 |
| 57 | deepseek/deepseek-v3.2 ($0.0019)::ae5a9e1f9410 @ 2026-03-04 | 84.8 | under_tested | 18 | 91.8 |
| 58 | entrant_006_z-ai--glm-5::4bae2d47b34c @ 2026-03-07 | 84.8 | stable | 284 | 23.8 |
| 59 | gpt-5-mini ($0.0076)::058b46859b5d @ 2026-03-04 | 84.7 | provisional | 31 | 70.7 |
| 60 | entrant_004_gpt-5.3-codex::5cad1cf65f38 @ 2026-03-07 | 84.7 | stable | 285 | 23.8 |
| 61 | qwen/qwen3-max-thinking ($0.0085)::ada2e8493ea1 @ 2026-03-04 | 84.6 | under_tested | 22 | 83.4 |
| 62 | entrant_000_gpt-5.2::381b51bd0a04 @ 2026-03-07 | 84.4 | stable | 284 | 23.8 |
| 63 | entrant_013_anthropic--claude-opus-4.6::38244ecbece9 @ 2026-03-07 | 84.0 | under_tested | 24 | 80.0 |
| 64 | z-ai/glm-5 ($0.0371)::cb0020652f27 @ 2026-03-04 | 83.9 | under_tested | 25 | 78.4 |
| 65 | entrant_013_anthropic--claude-opus-4.6::5c868b25a52f @ 2026-03-07 | 83.8 | stable | 285 | 23.7 |
| 66 | entrant_011_moonshotai--kimi-k2.5::97459f7b08ce @ 2026-03-07 | 83.3 | stable | 285 | 23.7 |
| 67 | entrant_013_anthropic--claude-opus-4.6::17c222e0ccd1 @ 2026-03-07 | 82.2 | under_tested | 16 | 97.0 |
| 68 | entrant_015_anthropic--claude-sonnet-4.6::91715cc50e5e @ 2026-03-07 | 82.1 | stable | 284 | 23.8 |
| 69 | gpt-5.2-codex ($0.0695)::00da108f1d3c @ 2026-02-27 | 80.9 | under_tested | 12 | 110.9 |
| 70 | entrant_006_z-ai--glm-5::16b2342fb880 @ 2026-03-07 | 80.9 | stable | 284 | 23.8 |
| 71 | gpt-5-mini ($0.0232)::7ed20c1065d6 @ 2026-02-27 | 80.3 | under_tested | 8 | 133.3 |
| 72 | moonshotai/kimi-k2.5 (recovered_after_fix) ($0.0558)::5476e97ed2c8 @ 2026-03-07 | 80.0 | provisional | 34 | 67.6 |
| 73 | gpt-5.2 (recovered_after_fix) ($0.0915)::661c421e12a5 @ 2026-03-04 | 80.0 | under_tested | 29 | 73.0 |
| 74 | entrant_014_anthropic--claude-opus-4.6::9a62dcbb8a3b @ 2026-03-07 | 79.6 | stable | 284 | 23.8 |
| 75 | z-ai/glm-5 ($0.0481)::44808dece37d @ 2026-03-04 | 78.8 | under_tested | 24 | 80.0 |
| 76 | entrant_014_anthropic--claude-sonnet-4.6::b8daeba4a7cf @ 2026-03-07 | 78.5 | stable | 284 | 23.8 |
| 77 | entrant_014_anthropic--claude-sonnet-4.6::71d6f6447cbc @ 2026-03-07 | 78.5 | stable | 284 | 23.8 |
| 78 | entrant_013_anthropic--claude-opus-4.6::6035e544b9be @ 2026-03-07 | 78.3 | stable | 284 | 23.8 |
| 79 | entrant_013_anthropic--claude-opus-4.6::01029ef54314 @ 2026-03-07 | 78.3 | under_tested | 16 | 97.0 |
| 80 | entrant_012_moonshotai--kimi-k2.5::5476e97ed2c8 @ 2026-03-07 | 77.8 | stable | 284 | 23.8 |
| 81 | google/gemini-3.1-pro-preview ($0.0708)::066d0848caff @ 2026-03-04 | 77.7 | under_tested | 25 | 78.4 |
| 82 | arcee-ai/trinity-large-preview:free ($0.0000)::29c62944fbd3 @ 2026-03-04 | 74.8 | under_tested | 24 | 80.0 |
| 83 | gpt-5.2-codex ($0.4983)::ab71abbabbae @ 2026-03-04 | 74.4 | under_tested | 18 | 91.8 |
| 84 | gpt-5.3-codex ($0.0753)::880993f40176 @ 2026-03-07 | 74.2 | provisional | 34 | 67.6 |
| 85 | entrant_015_anthropic--claude-sonnet-4.6::09e756834eda @ 2026-03-07 | 73.5 | stable | 283 | 23.8 |
| 86 | anthropic/claude-opus-4.6 ($0.0833)::6ba3403d42aa @ 2026-03-04 | 73.4 | under_tested | 21 | 85.3 |
| 87 | entrant_012_moonshotai--kimi-k2.5::0e4c4b3371ea @ 2026-03-07 | 72.7 | stable | 283 | 23.8 |
| 88 | entrant_005_gpt-5.3-codex::880993f40176 @ 2026-03-07 | 72.6 | stable | 284 | 23.8 |
| 89 | entrant_004_gpt-5.3-codex::3d8ddcce263a @ 2026-03-07 | 72.3 | stable | 283 | 23.8 |
| 90 | gpt-5.3-codex ($0.0000)::82d721235cc3 @ 2026-02-27 | 72.3 | under_tested | 12 | 110.9 |
| 91 | entrant_007_z-ai--glm-5::60475863ae30 @ 2026-03-07 | 71.3 | stable | 284 | 23.8 |
| 92 | qwen/qwen3.5-122b-a10b ($0.0207)::1023d7d1ecf9 @ 2026-03-04 | 71.1 | under_tested | 29 | 73.0 |
| 93 | entrant_004_gpt-5.3-codex::26c58495c5e9 @ 2026-03-07 | 70.1 | stable | 284 | 23.8 |
| 94 | anthropic/claude-opus-4.6 ($0.8473)::17c222e0ccd1 @ 2026-03-04 | 69.3 | under_tested | 22 | 83.4 |
| 95 | entrant_005_google--gemini-3.1-pro-preview::53822cf06dda @ 2026-03-07 | 69.2 | stable | 283 | 23.8 |
| 96 | entrant_006_z-ai--glm-5::24b8f171bca6 @ 2026-03-07 | 68.8 | stable | 284 | 23.8 |
| 97 | entrant_000_gpt-5.2::34ff3d41d915 @ 2026-03-07 | 68.6 | under_tested | 32 | 69.6 |
| 98 | gpt-5-mini ($0.0175)::2af654aceacc @ 2026-03-04 | 66.9 | under_tested | 21 | 85.3 |
| 99 | entrant_007_qwen--qwen3-max-thinking::f438fd18faee @ 2026-03-07 | 66.5 | stable | 283 | 23.8 |
| 100 | entrant_009_google--gemini-3-flash-preview::2db816dc429f @ 2026-03-07 | 66.0 | stable | 283 | 23.8 |
| 101 | gpt-5-nano ($0.0103)::21d869229d89 @ 2026-03-04 | 65.3 | under_tested | 20 | 87.3 |
| 102 | entrant_000_gpt-5.4::18fd032f43d1 @ 2026-03-07 | 64.6 | stable | 283 | 23.8 |
| 103 | gpt-5.3-codex ($0.0617)::15ca78810d8f @ 2026-03-04 | 63.9 | under_tested | 28 | 74.3 |
| 104 | gpt-5.3-codex ($0.4748)::1399bc429a50 @ 2026-03-04 | 62.1 | under_tested | 17 | 94.3 |
| 105 | anthropic/claude-opus-4.6 ($0.0846)::9a62dcbb8a3b @ 2026-03-07 | 61.7 | under_tested | 24 | 80.0 |
| 106 | gpt-5.4 ($0.0000)::14ff6b6748de @ 2026-03-07 | 61.7 | under_tested | 24 | 80.0 |
| 107 | moonshotai/kimi-k2.5 ($0.0325)::75c2cc06f5f9 @ 2026-03-04 | 61.5 | under_tested | 24 | 80.0 |
| 108 | anthropic/claude-opus-4.6 ($0.0875)::b0f54ac64a0f @ 2026-03-07 | 60.9 | under_tested | 22 | 83.4 |
| 109 | gpt-5.2-codex ($0.0507)::aef8969aacc7 @ 2026-03-07 | 59.9 | provisional | 34 | 67.6 |
| 110 | gpt-5.2 (recovered_after_fix) ($0.1364)::2efbb468d8e4 @ 2026-03-07 | 59.8 | provisional | 34 | 67.6 |
| 111 | z-ai/glm-5 ($0.0116)::b0dd13061084 @ 2026-03-07 | 59.7 | under_tested | 22 | 83.4 |
| 112 | anthropic/claude-sonnet-4.6 ($0.0579)::09e756834eda @ 2026-03-07 | 59.5 | under_tested | 22 | 83.4 |
| 113 | gpt-5.3-codex (recovered_after_fix) ($0.5605)::e954ca523560 @ 2026-03-04 | 59.5 | under_tested | 26 | 77.0 |
| 114 | gpt-5.2 ($0.0667)::47eb5fc99f6f @ 2026-02-27 | 59.2 | under_tested | 12 | 110.9 |
| 115 | moonshotai/kimi-k2.5 (recovered_after_fix) ($0.0209)::04fe201a22c6 @ 2026-03-07 | 59.2 | under_tested | 24 | 80.0 |
| 116 | z-ai/glm-5 ($0.0093)::cb5aa20bd106 @ 2026-03-04 | 58.9 | under_tested | 14 | 103.3 |
| 117 | google/gemini-3.1-pro-preview ($0.3446)::37db7ffea127 @ 2026-03-04 | 58.9 | under_tested | 19 | 89.4 |
| 118 | entrant_005_gpt-5.3-codex::665c73d58b45 @ 2026-03-07 | 58.8 | stable | 282 | 23.8 |
| 119 | entrant_004_gpt-5.3-codex::82d721235cc3 @ 2026-03-07 | 58.8 | stable | 284 | 23.8 |
| 120 | entrant_003_gpt-5.2-codex::00da108f1d3c @ 2026-03-07 | 58.8 | stable | 283 | 23.8 |
| 121 | gpt-5-mini ($0.0103)::67c9498f1701 @ 2026-02-27 | 58.5 | under_tested | 12 | 110.9 |
| 122 | entrant_000_gpt-5.2::7d23fd327111 @ 2026-03-07 | 58.5 | stable | 283 | 23.8 |
| 123 | entrant_001_gpt-5.2::9e6a8333c618 @ 2026-03-07 | 58.3 | stable | 283 | 23.8 |
| 124 | entrant_004_gpt-5.3-codex::60fc1c213ad8 @ 2026-03-07 | 58.2 | stable | 282 | 23.9 |
| 125 | qwen/qwen3.5-122b-a10b ($0.0250)::3a876f4663d4 @ 2026-03-04 | 58.2 | under_tested | 26 | 77.0 |
| 126 | entrant_000_gpt-5.2::5e3db3fbd34c @ 2026-03-07 | 58.1 | stable | 284 | 23.8 |
| 127 | entrant_001_gpt-5.2::a02047939ea9 @ 2026-03-07 | 58.0 | stable | 283 | 23.8 |
| 128 | entrant_004_gpt-5.2-codex @ 2026-03-07 | 57.4 | stable | 283 | 23.8 |
| 129 | entrant_019_bytedance-seed--seed-2.0-mini::513ee872c075 @ 2026-03-07 | 57.2 | stable | 286 | 23.8 |
| 130 | gpt-5-nano ($0.0104)::d41b2f44dda7 @ 2026-02-27 | 57.2 | under_tested | 8 | 133.3 |
| 131 | entrant_005_gpt-5.3-codex::057f7457c82d @ 2026-03-07 | 57.1 | stable | 283 | 23.8 |
| 132 | entrant_001_gpt-5-mini::b4bd6cd5e542 @ 2026-03-07 | 56.7 | stable | 283 | 23.8 |
| 133 | entrant_001_gpt-5.2::2efbb468d8e4 @ 2026-03-07 | 56.6 | stable | 284 | 23.8 |
| 134 | entrant_008_qwen--qwen3.5-122b-a10b::11c15e22c8e8 @ 2026-03-07 | 56.6 | stable | 283 | 23.8 |
| 135 | entrant_009_qwen--qwen3.5-122b-a10b::58a5ba6c9338 @ 2026-03-07 | 56.5 | stable | 283 | 23.8 |
| 136 | entrant_004_gpt-5.3-codex::5ca5a945609f @ 2026-03-07 | 56.4 | stable | 284 | 23.8 |
| 137 | entrant_008_qwen--qwen3-max-thinking::00e3223323da @ 2026-03-07 | 56.3 | stable | 283 | 23.8 |
| 138 | stepfun/step-3.5-flash:free ($0.0000)::b4370bd94d70 @ 2026-03-04 | 56.2 | under_tested | 23 | 81.6 |
| 139 | entrant_017_stepfun--step-3.5-flash_free @ 2026-03-07 | 56.1 | stable | 284 | 23.8 |
| 140 | anthropic/claude-sonnet-4.6 ($0.0420)::a0d3ca1ae9ad @ 2026-03-04 | 56.1 | under_tested | 13 | 106.9 |
| 141 | entrant_016_stepfun--step-3.5-flash_free::1eb8204f4a33 @ 2026-03-07 | 56.0 | stable | 282 | 23.8 |
| 142 | entrant_003_gpt-5.2-codex::5f0a5b2529a5 @ 2026-03-07 | 55.9 | stable | 284 | 23.8 |
| 143 | entrant_000_gpt-5.2::761693f14061 @ 2026-03-07 | 55.7 | stable | 284 | 23.8 |
| 144 | entrant_001_gpt-5-mini::87821c3c85b1 @ 2026-03-07 | 55.7 | stable | 283 | 23.8 |
| 145 | z-ai/glm-5 ($0.0108)::5f72c0eb881c @ 2026-03-04 | 55.5 | under_tested | 22 | 83.4 |
| 146 | moonshotai/kimi-k2.5 ($0.0046)::0e4c4b3371ea @ 2026-03-07 | 55.1 | under_tested | 22 | 83.4 |
| 147 | qwen/qwen3.5-122b-a10b (recovered_after_fix) ($0.0099)::9237962e52ca @ 2026-03-04 | 54.9 | under_tested | 23 | 81.6 |
| 148 | qwen/qwen3-max-thinking ($0.0644)::00e3223323da @ 2026-03-07 | 54.6 | provisional | 34 | 67.6 |
| 149 | entrant_009_google--gemini-3-flash-preview::6ac5fff628cd @ 2026-03-07 | 54.5 | stable | 283 | 23.8 |
| 150 | entrant_001_gpt-5-mini::7ed20c1065d6 @ 2026-03-07 | 54.3 | stable | 284 | 23.8 |
| 151 | entrant_001_gpt-5-mini::ad1d783a4c70 @ 2026-03-07 | 54.3 | stable | 283 | 23.8 |
| 152 | entrant_001_gpt-5-mini::8f38c4d9855c @ 2026-03-07 | 54.2 | stable | 283 | 23.8 |
| 153 | entrant_002_gpt-5-mini @ 2026-03-07 | 54.0 | stable | 283 | 23.8 |
| 154 | entrant_002_gpt-5-nano::4e47419a7589 @ 2026-03-07 | 53.9 | stable | 283 | 23.8 |
| 155 | entrant_001_gpt-5-mini::2822279cbf1a @ 2026-03-07 | 53.8 | stable | 283 | 23.8 |
| 156 | qwen/qwen3.5-122b-a10b ($0.0466)::58a5ba6c9338 @ 2026-03-07 | 53.8 | provisional | 34 | 67.6 |
| 157 | entrant_004_gpt-5.3-codex::1cbbac7a4039 @ 2026-03-07 | 53.7 | stable | 284 | 23.8 |
| 158 | stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::1eb8204f4a33 @ 2026-02-27 | 53.4 | under_tested | 8 | 133.3 |
| 159 | entrant_001_gpt-5-mini::0c8fab332113 @ 2026-03-07 | 53.1 | stable | 283 | 23.8 |
| 160 | z-ai/glm-5 ($0.0102)::60475863ae30 @ 2026-03-07 | 52.9 | under_tested | 24 | 80.0 |
| 161 | gpt-5.3-codex ($0.0266)::86838dd03471 @ 2026-03-04 | 52.7 | under_tested | 18 | 91.8 |
| 162 | entrant_002_gpt-5-nano::099781c59e50 @ 2026-03-07 | 52.7 | stable | 283 | 23.8 |
| 163 | qwen/qwen3.5-122b-a10b ($0.0434)::71dca6c97f92 @ 2026-03-04 | 52.5 | under_tested | 27 | 75.6 |
| 164 | entrant_008_qwen--qwen3.5-122b-a10b::0241c2460b90 @ 2026-03-07 | 52.5 | stable | 283 | 23.8 |
| 165 | gpt-5-mini ($0.0222)::b4bd6cd5e542 @ 2026-02-27 | 52.4 | under_tested | 8 | 133.3 |
| 166 | bytedance-seed/seed-2.0-mini ($0.0010)::513ee872c075 @ 2026-03-07 | 51.8 | under_tested | 24 | 80.0 |
| 167 | google/gemini-3.1-pro-preview (recovered_after_fix) ($0.3999)::5540d6ab37a8 @ 2026-03-04 | 51.5 | under_tested | 18 | 91.8 |
| 168 | gpt-5-nano ($0.0138)::099781c59e50 @ 2026-02-27 | 51.2 | under_tested | 8 | 133.3 |
| 169 | gpt-5.3-codex ($0.0259)::665c73d58b45 @ 2026-03-07 | 51.1 | under_tested | 24 | 80.0 |
| 170 | gpt-5.4 ($0.0000)::18fd032f43d1 @ 2026-03-07 | 51.0 | under_tested | 22 | 83.4 |
| 171 | google/gemini-3.1-flash-lite-preview ($0.0032)::b0ae954bb34a @ 2026-03-04 | 51.0 | provisional | 30 | 71.8 |
| 172 | gpt-5-mini ($0.0077)::3928741d8858 @ 2026-03-07 | 50.6 | provisional | 34 | 67.6 |
| 173 | entrant_001_gpt-5-mini::67c9498f1701 @ 2026-03-07 | 50.2 | stable | 283 | 23.8 |
| 174 | entrant_011_moonshotai--kimi-k2.5::6b9555b535cf @ 2026-03-07 | 49.8 | stable | 284 | 23.8 |
| 175 | entrant_016_stepfun--step-3.5-flash_free::2aa14e16a463 @ 2026-03-07 | 49.6 | stable | 283 | 23.8 |
| 176 | entrant_001_gpt-5-mini::5bf4759e7c02 @ 2026-03-07 | 49.4 | stable | 283 | 23.8 |
| 177 | gpt-5.3-codex ($0.0000)::26c58495c5e9 @ 2026-02-27 | 48.8 | under_tested | 8 | 133.3 |
| 178 | entrant_006_z-ai--glm-5::432a47fdb873 @ 2026-03-07 | 47.9 | stable | 283 | 23.8 |
| 179 | arcee-ai/trinity-large-preview:free ($0.0000)::e5c9c34f4cf9 @ 2026-03-04 | 47.8 | under_tested | 26 | 77.0 |
| 180 | gpt-5.2 ($0.0430)::39826a082fb0 @ 2026-02-27 | 47.3 | under_tested | 6 | 151.2 |
| 181 | entrant_011_minimax--minimax-m2.5 @ 2026-03-07 | 47.3 | stable | 283 | 23.8 |
| 182 | gpt-5.2 ($0.0432)::a02047939ea9 @ 2026-03-07 | 47.3 | under_tested | 24 | 80.0 |
| 183 | gpt-5.3-codex ($0.0325)::057f7457c82d @ 2026-03-07 | 47.2 | under_tested | 22 | 83.4 |
| 184 | gpt-5.2 ($0.0319)::9e6a8333c618 @ 2026-03-07 | 47.1 | under_tested | 22 | 83.4 |
| 185 | gpt-5-mini ($0.0087)::87821c3c85b1 @ 2026-02-27 | 46.5 | under_tested | 8 | 133.3 |
| 186 | anthropic/claude-sonnet-4.6 ($0.3961)::74e8f80b29ee @ 2026-03-04 | 46.1 | provisional | 31 | 70.7 |
| 187 | gpt-5.2 ($0.0396)::761693f14061 @ 2026-02-27 | 45.7 | under_tested | 8 | 133.3 |
| 188 | gpt-5.3-codex ($0.0000)::1cbbac7a4039 @ 2026-02-27 | 45.6 | under_tested | 6 | 151.2 |
| 189 | entrant_011_moonshotai--kimi-k2.5::86974ddeead2 @ 2026-03-07 | 45.6 | stable | 284 | 23.8 |
| 190 | google/gemini-3.1-flash-lite-preview ($0.0027)::745448837948 @ 2026-03-07 | 45.4 | under_tested | 22 | 83.4 |
| 191 | entrant_000_gpt-5.2::47eb5fc99f6f @ 2026-03-07 | 45.3 | stable | 284 | 23.8 |
| 192 | gpt-5-nano ($0.0025)::34dfa9a03ec2 @ 2026-03-04 | 45.3 | under_tested | 14 | 103.3 |
| 193 | entrant_015_anthropic--claude-sonnet-4.6::8eeefac1ec17 @ 2026-03-07 | 44.9 | stable | 284 | 23.8 |
| 194 | stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::71211240e6e7 @ 2026-03-07 | 44.6 | provisional | 34 | 67.6 |
| 195 | anthropic/claude-sonnet-4.6 ($0.0648)::8eeefac1ec17 @ 2026-03-07 | 44.5 | under_tested | 24 | 80.0 |
| 196 | gpt-5.2 ($0.0300)::791483e95653 @ 2026-03-04 | 44.4 | under_tested | 20 | 87.3 |
| 197 | bytedance-seed/seed-2.0-mini ($0.0063)::284413223bc7 @ 2026-03-04 | 44.4 | under_tested | 27 | 75.6 |
| 198 | entrant_000_gpt-5.2::39826a082fb0 @ 2026-03-07 | 43.8 | stable | 285 | 23.8 |
| 199 | qwen/qwen3.5-122b-a10b (recovered_after_fix) ($0.0132)::b1f8ca87ed0a @ 2026-03-07 | 43.4 | under_tested | 22 | 83.4 |
| 200 | gpt-5-mini ($0.0090)::5bf4759e7c02 @ 2026-02-27 | 43.0 | under_tested | 6 | 151.2 |
| 201 | stepfun/step-3.5-flash:free ($0.0000)::2aa14e16a463 @ 2026-02-27 | 42.9 | under_tested | 12 | 110.9 |
| 202 | gpt-5.3-codex ($0.0544)::7b791c451590 @ 2026-03-04 | 42.6 | provisional | 31 | 70.7 |
| 203 | entrant_000_gpt-5.4::7e297e7b9118 @ 2026-03-07 | 42.2 | stable | 284 | 23.8 |
| 204 | entrant_012_deepseek--deepseek-v3.2::babb7f633345 @ 2026-03-07 | 41.9 | stable | 283 | 23.8 |
| 205 | google/gemini-3.1-flash-lite-preview ($0.0125)::c096dda29618 @ 2026-03-04 | 41.7 | under_tested | 27 | 75.6 |
| 206 | gpt-5.2-codex ($0.0487)::0b500f1f8734 @ 2026-02-27 | 41.5 | under_tested | 12 | 110.9 |
| 207 | deepseek/deepseek-v3.2 (recovered_after_fix) ($0.0038)::71a3315ecc07 @ 2026-03-04 | 41.0 | under_tested | 20 | 87.3 |
| 208 | entrant_010_google--gemini-3.1-flash-lite-preview::745448837948 @ 2026-03-07 | 40.6 | stable | 283 | 23.8 |
| 209 | entrant_013_deepseek--deepseek-v3.2::cd80f58124a8 @ 2026-03-07 | 40.5 | stable | 283 | 23.9 |
| 210 | stepfun/step-3.5-flash:free ($0.0000)::be86064bd9b6 @ 2026-02-27 | 40.5 | under_tested | 12 | 110.9 |
| 211 | arcee-ai/trinity-large-preview:free ($0.0000)::16bb68f624ee @ 2026-02-27 | 39.9 | under_tested | 8 | 133.3 |
| 212 | anthropic/claude-sonnet-4.6 ($0.3750)::263e91e37c96 @ 2026-03-04 | 39.5 | under_tested | 25 | 78.4 |
| 213 | entrant_013_deepseek--deepseek-v3.2::708fac99e5dc @ 2026-03-07 | 39.5 | stable | 283 | 23.8 |
| 214 | entrant_016_arcee-ai--trinity-large-preview_free::b15c0f016557 @ 2026-03-07 | 39.2 | stable | 282 | 23.9 |
| 215 | minimax/minimax-m2.5 ($0.0051)::06bd7cb68806 @ 2026-03-04 | 39.0 | under_tested | 27 | 75.6 |
| 216 | minimax/minimax-m2.5 (recovered_after_fix) ($0.0179)::7c939d8643c1 @ 2026-03-07 | 38.7 | provisional | 34 | 67.6 |
| 217 | deepseek/deepseek-v3.2 ($0.0018)::708fac99e5dc @ 2026-03-07 | 38.7 | under_tested | 22 | 83.4 |
| 218 | arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::b15c0f016557 @ 2026-03-07 | 38.4 | under_tested | 24 | 80.0 |
| 219 | entrant_007_qwen--qwen3-max-thinking::45cf191da6b2 @ 2026-03-07 | 38.3 | stable | 280 | 23.9 |
| 220 | gpt-5-nano ($0.0076)::03769244b16e @ 2026-02-27 | 37.7 | under_tested | 6 | 151.2 |
| 221 | entrant_012_deepseek--deepseek-v3.2::b09f4a5411ae @ 2026-03-07 | 37.7 | stable | 284 | 23.8 |
| 222 | google/gemini-3.1-flash-lite-preview ($0.0020)::5ed71f0ce79a @ 2026-03-07 | 37.4 | under_tested | 24 | 80.0 |
| 223 | gpt-5-mini ($0.0110)::8581fe62e905 @ 2026-03-04 | 37.4 | under_tested | 16 | 97.0 |
| 224 | entrant_001_gpt-5-mini::9643aa170276 @ 2026-03-07 | 37.1 | stable | 280 | 23.9 |
| 225 | arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::42448db4449b @ 2026-03-07 | 36.7 | under_tested | 22 | 83.4 |
| 226 | entrant_008_qwen--qwen3.5-122b-a10b::af4bb1a03d77 @ 2026-03-07 | 36.6 | stable | 282 | 23.9 |
| 227 | entrant_001_gpt-5-mini::048e9bf281bb @ 2026-03-07 | 36.5 | stable | 282 | 23.9 |
| 228 | entrant_002_gpt-5-nano::9d755956a0f6 @ 2026-03-07 | 36.3 | stable | 283 | 23.8 |
| 229 | qwen/qwen3-max-thinking (recovered_after_fix) ($0.0199)::83206da24217 @ 2026-03-07 | 36.3 | under_tested | 24 | 80.0 |
| 230 | gpt-5-nano ($0.0070)::3b80bc411288 @ 2026-02-27 | 36.0 | under_tested | 8 | 133.3 |
| 231 | entrant_010_minimax--minimax-m2.5::e7794d25f07b @ 2026-03-07 | 35.4 | stable | 284 | 23.8 |
| 232 | gpt-5-nano ($0.0055)::62315ee296bc @ 2026-03-04 | 35.4 | under_tested | 23 | 81.6 |
| 233 | deepseek/deepseek-v3.2 ($0.0018)::0638cde804dc @ 2026-03-07 | 35.1 | under_tested | 24 | 80.0 |
| 234 | entrant_019_bytedance-seed--seed-2.0-mini::1d511fe15598 @ 2026-03-07 | 35.0 | stable | 281 | 23.9 |
| 235 | entrant_015_arcee-ai--trinity-large-preview_free::09894b1bd9ea @ 2026-03-07 | 34.5 | stable | 282 | 23.9 |
| 236 | google/gemini-3.1-flash-lite-preview ($0.0169)::652b4056c583 @ 2026-03-04 | 34.4 | under_tested | 18 | 91.8 |
| 237 | qwen/qwen3.5-122b-a10b ($0.0038)::2b25ee71d64d @ 2026-03-07 | 33.8 | under_tested | 24 | 80.0 |
| 238 | entrant_019_bytedance-seed--seed-2.0-mini::20eb0e240e4a @ 2026-03-07 | 33.5 | under_tested | 1 | 282.8 |
| 239 | entrant_002_gpt-5-nano::03769244b16e @ 2026-03-07 | 32.3 | stable | 282 | 23.8 |
| 240 | qwen/qwen3-max-thinking ($0.0086)::99446e67ec0f @ 2026-03-07 | 31.9 | under_tested | 22 | 83.4 |
| 241 | deepseek/deepseek-v3.2 (recovered_after_fix) ($0.0083)::cd80f58124a8 @ 2026-03-07 | 31.8 | provisional | 34 | 67.6 |
| 242 | anthropic/claude-opus-4.6 ($0.7125)::01029ef54314 @ 2026-03-04 | 31.6 | under_tested | 26 | 77.0 |
| 243 | stepfun/step-3.5-flash:free (recovered_after_fix) ($0.0000)::4ab1bcc3e4b7 @ 2026-02-27 | 31.4 | under_tested | 8 | 133.3 |
| 244 | deepseek/deepseek-v3.2 ($0.0032)::7b6db8a35def @ 2026-03-04 | 31.1 | under_tested | 23 | 81.6 |
| 245 | entrant_002_gpt-5-nano::edc6e99823b9 @ 2026-03-07 | 30.9 | stable | 282 | 23.9 |
| 246 | bytedance-seed/seed-2.0-mini ($0.0009)::10023bce516e @ 2026-03-04 | 30.7 | under_tested | 19 | 89.4 |
| 247 | anthropic/claude-sonnet-4.6 ($0.6293)::1c1d04ac560e @ 2026-03-04 | 30.6 | under_tested | 24 | 80.0 |
| 248 | entrant_016_stepfun--step-3.5-flash_free::be86064bd9b6 @ 2026-03-07 | 29.1 | stable | 280 | 23.9 |
| 249 | entrant_007_qwen--qwen3-max-thinking::fe1be3eb2268 @ 2026-03-07 | 28.7 | stable | 282 | 23.9 |
| 250 | deepseek/deepseek-v3.2 ($0.0033)::301ceb9d61df @ 2026-03-04 | 28.7 | under_tested | 22 | 83.4 |
| 251 | deepseek/deepseek-v3.2 ($0.0114)::af7298d9a915 @ 2026-03-04 | 28.7 | under_tested | 26 | 77.0 |
| 252 | entrant_013_deepseek--deepseek-v3.2::0638cde804dc @ 2026-03-07 | 28.4 | stable | 284 | 23.8 |
| 253 | entrant_015_arcee-ai--trinity-large-preview_free::16bb68f624ee @ 2026-03-07 | 28.2 | stable | 281 | 23.9 |
| 254 | entrant_007_qwen--qwen3-max-thinking::352e53cd1449 @ 2026-03-07 | 28.1 | stable | 282 | 23.9 |
| 255 | entrant_007_qwen--qwen3-max-thinking::44e3d89d6410 @ 2026-03-07 | 28.1 | stable | 282 | 23.9 |
| 256 | moonshotai/kimi-k2.5 ($0.0088)::3417d570adb7 @ 2026-03-04 | 27.6 | under_tested | 14 | 103.3 |
| 257 | entrant_015_arcee-ai--trinity-large-preview_free::4a3b35ba8c06 @ 2026-03-07 | 27.5 | stable | 280 | 23.9 |
| 258 | qwen/qwen3.5-122b-a10b ($0.0646)::43c91e963cbe @ 2026-03-04 | 27.3 | under_tested | 15 | 100.0 |
| 259 | entrant_002_gpt-5-nano::d41b2f44dda7 @ 2026-03-07 | 26.4 | stable | 282 | 23.9 |
| 260 | z-ai/glm-5 ($0.0443)::2490d4ff540f @ 2026-03-04 | 26.3 | under_tested | 24 | 80.0 |
| 261 | entrant_003_gpt-5.2-codex::557237351b91 @ 2026-03-07 | 26.1 | stable | 280 | 23.9 |
| 262 | entrant_016_stepfun--step-3.5-flash_free::c36f05dc9ad2 @ 2026-03-07 | 25.8 | stable | 282 | 23.9 |
| 263 | minimax/minimax-m2.5 ($0.0130)::33656ecfc86a @ 2026-03-04 | 25.5 | provisional | 33 | 68.6 |
| 264 | gpt-5-nano ($0.0032)::b71e9163bf77 @ 2026-03-04 | 25.0 | under_tested | 15 | 100.0 |
| 265 | entrant_015_arcee-ai--trinity-large-preview_free::0ace044aeb44 @ 2026-03-07 | 24.8 | stable | 211 | 27.6 |
| 266 | entrant_002_gpt-5-nano::168b4641c9d2 @ 2026-03-07 | 24.7 | stable | 282 | 23.9 |
| 267 | google/gemini-3.1-flash-lite-preview (recovered_after_fix) ($0.0079)::5fa1aa40c3fd @ 2026-03-04 | 24.5 | under_tested | 14 | 103.3 |
| 268 | entrant_003_gpt-5.2-codex::0b500f1f8734 @ 2026-03-07 | 24.5 | stable | 281 | 23.9 |
| 269 | entrant_016_stepfun--step-3.5-flash_free::4ab1bcc3e4b7 @ 2026-03-07 | 24.3 | stable | 280 | 23.9 |
| 270 | entrant_016_arcee-ai--trinity-large-preview_free::42448db4449b @ 2026-03-07 | 24.3 | stable | 281 | 23.9 |
| 271 | entrant_016_arcee-ai--trinity-large-preview_free::0b87b7222640 @ 2026-03-07 | 24.1 | stable | 280 | 23.9 |
| 272 | entrant_008_qwen--qwen3.5-122b-a10b::4dfac77a88dd @ 2026-03-07 | 23.3 | stable | 281 | 23.9 |
| 273 | google/gemini-3.1-flash-lite-preview ($0.0044)::2372e9571823 @ 2026-03-04 | 23.3 | under_tested | 14 | 103.3 |
| 274 | arcee-ai/trinity-large-preview:free ($0.0000)::ce841544258f @ 2026-02-27 | 23.0 | under_tested | 12 | 110.9 |
| 275 | gpt-5-mini ($0.0097)::048e9bf281bb @ 2026-02-27 | 22.9 | under_tested | 12 | 110.9 |
| 276 | arcee-ai/trinity-large-preview:free ($0.0000)::1b493558fdb1 @ 2026-03-04 | 22.7 | under_tested | 19 | 89.4 |
| 277 | entrant_015_arcee-ai--trinity-large-preview_free::ce841544258f @ 2026-03-07 | 20.4 | stable | 281 | 23.9 |
| 278 | bytedance-seed/seed-2.0-mini (recovered_after_fix) ($0.0050)::1091b348e996 @ 2026-03-04 | 20.4 | under_tested | 7 | 141.4 |
| 279 | entrant_009_qwen--qwen3.5-122b-a10b::b1f8ca87ed0a @ 2026-03-07 | 20.3 | stable | 281 | 23.9 |
| 280 | entrant_015_arcee-ai--trinity-large-preview_free::545a42bbbd09 @ 2026-03-07 | 18.9 | stable | 281 | 23.9 |
| 281 | bytedance-seed/seed-2.0-mini ($0.0047)::1d511fe15598 @ 2026-03-07 | 18.9 | provisional | 34 | 67.6 |
| 282 | entrant_010_google--gemini-3.1-flash-lite-preview::4d6f4419c790 @ 2026-03-07 | 17.6 | stable | 280 | 23.9 |
| 283 | gpt-5-nano ($0.0058)::edc6e99823b9 @ 2026-02-27 | 17.3 | under_tested | 12 | 110.9 |
| 284 | entrant_015_arcee-ai--trinity-large-preview_free::c0e35d0722f2 @ 2026-03-07 | 17.1 | stable | 281 | 23.9 |
| 285 | entrant_010_minimax--minimax-m2.5::856d0f4c9892 @ 2026-03-07 | 16.9 | stable | 281 | 23.9 |
| 286 | entrant_010_google--gemini-3.1-flash-lite-preview::5ed71f0ce79a @ 2026-03-07 | 16.2 | stable | 282 | 23.8 |
| 287 | arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::1b9e3f0b2b30 @ 2026-03-04 | 14.2 | under_tested | 26 | 77.0 |
| 288 | gpt-5-nano ($0.0041)::b5ef3d9318f0 @ 2026-02-27 | 12.9 | under_tested | 12 | 110.9 |
| 289 | arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::b0e21c8cc606 @ 2026-03-04 | 12.1 | under_tested | 12 | 110.9 |
| 290 | entrant_008_qwen--qwen3-max-thinking::99446e67ec0f @ 2026-03-07 | 12.0 | stable | 280 | 23.9 |
| 291 | entrant_002_gpt-5-nano::b5ef3d9318f0 @ 2026-03-07 | 11.9 | stable | 282 | 23.9 |
| 292 | entrant_003_gpt-5-nano @ 2026-03-07 | 10.9 | stable | 281 | 23.9 |
| 293 | entrant_015_arcee-ai--trinity-large-preview_free::0f8a48b690b6 @ 2026-03-07 | 10.7 | stable | 162 | 31.3 |
| 294 | entrant_010_minimax--minimax-m2.5::80374b7181ce @ 2026-03-07 | 9.9 | stable | 281 | 23.9 |
| 295 | entrant_008_qwen--qwen3-max-thinking::83206da24217 @ 2026-03-07 | 9.6 | stable | 281 | 23.9 |
| 296 | entrant_002_gpt-5-nano::04639b45a655 @ 2026-03-07 | 9.2 | stable | 281 | 23.9 |
| 297 | entrant_008_qwen--qwen3.5-122b-a10b::547f7c89c067 @ 2026-03-07 | 7.6 | stable | 283 | 23.8 |
| 298 | entrant_012_deepseek--deepseek-v3.2::1516bc091028 @ 2026-03-07 | 6.0 | stable | 282 | 23.9 |
| 299 | entrant_009_qwen--qwen3.5-122b-a10b::2b25ee71d64d @ 2026-03-07 | 5.7 | stable | 280 | 23.9 |
| 300 | entrant_002_gpt-5-nano::3b80bc411288 @ 2026-03-07 | 4.9 | stable | 282 | 23.9 |
| 301 | gpt-5-nano ($0.0049)::1a34fca062d0 @ 2026-03-07 | 4.9 | provisional | 34 | 67.6 |
| 302 | gpt-5.2-codex ($0.0275)::124e05529c56 @ 2026-03-04 | 4.8 | under_tested | 21 | 85.3 |
| 303 | entrant_016_stepfun--step-3.5-flash_free::57027fa97bfc @ 2026-03-07 | 4.1 | stable | 281 | 23.9 |
| 304 | entrant_002_gpt-5-nano::681d5465556b @ 2026-03-07 | 3.8 | stable | 282 | 23.8 |
| 305 | entrant_013_anthropic--claude-opus-4.6::6ba3403d42aa @ 2026-03-07 | 3.4 | under_tested | 24 | 80.0 |
| 306 | bytedance-seed/seed-2.0-mini ($0.0062)::9c565cec5a53 @ 2026-03-04 | 2.9 | provisional | 33 | 68.6 |
| 307 | entrant_015_arcee-ai--trinity-large-preview_free::9a5c7e5c7b07 @ 2026-03-07 | 1.0 | stable | 282 | 23.9 |
| 308 | arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::0b87b7222640 @ 2026-03-07 | 0.3 | provisional | 34 | 67.6 |
| 309 | qwen/qwen3.5-122b-a10b (recovered_after_fix) ($0.0115)::140e17a0d40b @ 2026-03-04 | 0.0 | under_tested | 12 | 110.9 |
| 310 | arcee-ai/trinity-large-preview:free ($0.0000)::4a3b35ba8c06 @ 2026-02-27 | 0.0 | under_tested | 8 | 133.3 |
| 311 | google/gemini-3.1-flash-lite-preview ($0.0023)::1be8da66db78 @ 2026-03-04 | 0.0 | under_tested | 19 | 89.4 |
| 312 | arcee-ai/trinity-large-preview:free (recovered_after_fix) ($0.0000)::c0e35d0722f2 @ 2026-02-27 | 0.0 | under_tested | 8 | 133.3 |
| 313 | gpt-5-nano ($0.0031)::a37024d8b02c @ 2026-03-04 | 0.0 | under_tested | 21 | 85.3 |
| 314 | arcee-ai/trinity-large-preview:free ($0.0000)::682f10efa6e9 @ 2026-03-04 | 0.0 | under_tested | 26 | 77.0 |
| 315 | arcee-ai/trinity-large-preview:free ($0.0000)::545a42bbbd09 @ 2026-02-27 | 0.0 | under_tested | 12 | 110.9 |
| 316 | gpt-5-nano (recovered_after_fix) ($0.0065)::7b7318670453 @ 2026-03-04 | 0.0 | provisional | 31 | 70.7 |
| 317 | google/gemini-3.1-flash-lite-preview ($0.0040)::4d6f4419c790 @ 2026-03-07 | 0.0 | provisional | 34 | 67.6 |