Rankings based on user votes and Elo ratings
# | Model(14/14) | Elo | Win % | Votes | Record |
|---|---|---|---|---|---|
| 1 | Gemini 3 Flash Preview | 1268 | 64.3% | 14 | 9W-3L-2T |
| 2 | Functionary Swahili Large | 1262 | 57.7% | 26 | 15W-9L-2T |
| 3 | GLM 5 | 1254 | 42.9% | 21 | 9W-5L-7T |
| 4 | Claude Sonnet 4.5 | 1244 | 57.1% | 14 | 8W-5L-1T |
| 5 | GPT-oss-120B | 1241 | 38.5% | 13 | 5W-3L-5T |
| 6 | Claude Sonnet 4.6 | 1233 | 53.3% | 15 | 8W-6L-1T |
| 7 | Grok 4.1 Fast | 1208 | 41.7% | 24 | 10W-11L-3T |
| 8 | Rnj 1 Instruct | 1193 | 37.5% | 8 | 3W-3L-2T |
| 9 | Functionary Swahili Mini | 1191 | 40.0% | 20 | 8W-8L-4T |
| 10 | Claude Haiku 4.5 | 1182 | 36.8% | 19 | 7W-9L-3T |
| 11 | Gemini 2.5 Flash Lite | 1178 | 33.3% | 15 | 5W-7L-3T |
| 12 | GPT-5.2 | 1155 | 33.3% | 18 | 6W-9L-3T |
| 13 | Trinity Large Preview | 1140 | 33.3% | 15 | 5W-9L-1T |
| 14 | GPT-5 Nano | 1120 | 33.3% | 21 | 7W-13L-1T |
Click column headers to sort by category
Model(14/14) | Overall | Long Prompt | Hard Prompt | Math | Instruction-Following | Coding |
|---|---|---|---|---|---|---|
| Gemini 3 Flash Preview | 1 | — | 10 | — | — | 5 |
| Functionary Swahili Large | 2 | — | 1 | — | 8 | 3 |
| GLM 5 | 3 | — | 6 | — | 5 | 4 |
| Claude Sonnet 4.5 | 4 | — | 2 | 1 | — | 13 |
| GPT-oss-120B | 5 | — | 7 | — | 3 | 8 |
| Claude Sonnet 4.6 | 6 | — | 3 | — | 12 | 9 |
| Grok 4.1 Fast | 7 | — | 14 | — | 2 | 2 |
| Rnj 1 Instruct | 8 | — | 11 | — | 4 | — |
| Functionary Swahili Mini | 9 | — | 4 | — | 10 | 6 |
| Claude Haiku 4.5 | 10 | — | 9 | — | 11 | 1 |
| Gemini 2.5 Flash Lite | 11 | — | 12 | 2 | 1 | 10 |
| GPT-5.2 | 12 | — | 8 | 3 | 6 | 12 |
| Trinity Large Preview | 13 | — | 5 | — | 7 | 7 |
| GPT-5 Nano | 14 | — | 13 | 4 | 9 | 11 |