Performance Overview
Updated 1 day ago · view sourceML-Master 2.0 (Deepseek-V3.2-Speciale)
56.4%
Leeroo (Gemini-3-Pro-Preview)
50.7%
Thesis (gpt-5-codex)
48.4%
CAIR MLE-STAR-Pro-1.5 (Gemini-2.5-Pro)
44.0%
FM Agent (Gemini-2.5-Pro)
43.6%
Operand ensemble (gpt-5 (low verbosity/effort))
39.6%
CAIR MLE-STAR-Pro-1.0 (Gemini-2.5-Pro)
38.7%
InternAgent (deepseek-r1)
36.4%
R&D-Agent (gpt-5)
35.1%
Neo multi-agent (undisclosed)
34.2%
AIRA-dojo (o3)
31.6%
R&D-Agent (o3 + GPT-4.1)
30.2%
ML-Master (deepseek-r1)
29.3%
R&D-Agent (o1-preview)
22.4%
AIDE (o1-preview)
17.1%
AIDE (gpt-4o-2024-08-06)
8.6%
AIDE (claude-3-5-sonnet-20240620)
7.6%
OpenHands (gpt-4o-2024-08-06)
4.9%
AIDE (llama-3.1-405b-instruct)
3.3%
MLAB (gpt-4o-2024-08-06)
1.6%
Leaderboard
Updated 1 day ago · view source| Agent | LLM | Low/Lite | Medium | High | Overall | Time | Date | Reports | Source |
|---|---|---|---|---|---|---|---|---|---|
| ML-Master 2.0 | Deepseek-V3.2-Speciale | 75.76 ± 1.51 | 50.88 ± 3.51 | 42.22 ± 2.22 | 56.44 ± 2.47 | 24h | 2025-12-16 | Available | - |
| Leeroo | Gemini-3-Pro-Preview | 68.18 ± 2.62 | 44.74 ± 1.52 | 40.00 ± 0.00 | 50.67 ± 1.33 | 24h | 2025-12-07 | Available | - |
| Thesis | gpt-5-codex | 65.15 ± 1.52 | 45.61 ± 7.18 | 31.11 ± 2.22 | 48.44 ± 3.64 | 24h | 2025-11-10 | Available | - |
| CAIR MLE-STAR-Pro-1.5 | Gemini-2.5-Pro | 68.18 ± 2.62 | 34.21 ± 1.52 | 33.33 ± 0.00 | 44.00 ± 1.33 | 24h | 2025-11-25 | Available | - |
| FM Agent | Gemini-2.5-Pro | 62.12 ± 1.52 | 36.84 ± 1.52 | 33.33 ± 0.00 | 43.56 ± 0.89 | 24h | 2025-10-10 | Available | - |
| Operand ensemble | gpt-5 (low verbosity/effort) | 63.64 ± 0.00 | 33.33 ± 0.88 | 20.00 ± 0.00 | 39.56 ± 0.44 | 24h | 2025-10-06 | Available | - |
| CAIR MLE-STAR-Pro-1.0 | Gemini-2.5-Pro | 66.67 ± 1.52 | 25.44 ± 0.88 | 31.11 ± 2.22 | 38.67 ± 0.77 | 12h | 2025-11-03 | Available | - |
| InternAgent | deepseek-r1 | 62.12 ± 3.03 | 26.32 ± 2.63 | 24.44 ± 2.22 | 36.44 ± 1.18 | 12h | 2025-09-12 | Available | - |
| R&D-Agent | gpt-5 | 68.18 ± 2.62 | 21.05 ± 1.52 | 22.22 ± 2.22 | 35.11 ± 0.44 | 12h | 2025-09-26 | Available | Available |
| Neo multi-agent | undisclosed | 48.48 ± 1.52 | 29.82 ± 2.32 | 24.44 ± 2.22 | 34.22 ± 0.89 | 36h | 2025-07-28 | Available | - |
| AIRA-dojo | o3 | 55.00 ± 1.47 | 21.97 ± 1.17 | 21.67 ± 1.07 | 31.60 ± 0.82 | 24h | 2025-05-15 | Available | Available |
| R&D-Agent | o3 + GPT-4.1 | 51.52 ± 4.01 | 19.30 ± 3.16 | 26.67 ± 0.00 | 30.22 ± 0.89 | 24h | 2025-08-15 | Available | Available |
| ML-Master | deepseek-r1 | 48.48 ± 1.52 | 20.18 ± 2.32 | 24.44 ± 2.22 | 29.33 ± 0.77 | 12h | 2025-06-17 | Available | Available |
| R&D-Agent | o1-preview | 48.18 ± 1.11 | 8.95 ± 1.05 | 18.67 ± 1.33 | 22.40 ± 0.50 | 24h | 2025-05-14 | Available | Available |
| AIDE | o1-preview | 35.91 ± 1.86 | 8.45 ± 0.43 | 11.67 ± 1.27 | 17.12 ± 0.61 | 24h | 2024-10-08 | Available | Available |
| AIDE | gpt-4o-2024-08-06 | 18.55 ± 1.26 | 3.06 ± 0.33 | 8.15 ± 0.84 | 8.63 ± 0.54 | 24h | 2024-10-08 | Available | Available |
| AIDE | claude-3-5-sonnet-20240620 | 19.70 ± 1.52 | 2.63 ± 1.52 | 2.22 ± 2.22 | 7.56 ± 1.60 | 24h | 2024-10-08 | Available | Available |
| OpenHands | gpt-4o-2024-08-06 | 12.12 ± 1.52 | 1.75 ± 0.88 | 2.22 ± 2.22 | 4.89 ± 0.44 | 24h | 2024-10-08 | Available | Available |
| AIDE | llama-3.1-405b-instruct | 10.23 ± 1.14 | 0.66 ± 0.66 | 0.00 ± 0.00 | 3.33 ± 0.38 | 24h | 2024-10-08 | Available | Available |
| MLAB | gpt-4o-2024-08-06 | 4.55 ± 0.86 | 0.00 ± 0.00 | 0.00 ± 0.00 | 1.60 ± 0.27 | 24h | 2024-10-08 | Available | Available |