Frontier-CSBenchmark

A benchmark that doesn't saturate.

01

Unsolved

No solution has achieved perfect scores.

02

Open-ended

Research and optimization challenges.

03

Verifiable

Continuous scoring, always room to improve.

04

Diverse

Systems, ML, algorithms, security, and more.

frontier-cs · harbor trial · 2.0

# Run a Frontier-CS 2.0 task through Harbor

$ uv run frontier harbor trial 2.0 erdos_unit_distance \ -a codex -m gpt-5.5 --json

generating frontier-cs-2-0-erdos-unit-distance

starting Harbor trial with iterative submissions...

submit #1 score: 55.80

submit #2 score: 78.50

trial_status=scored agent_status=completed

reward=0.7850 cost=$1.94 submissions=2

tokens: 1.51M input / 19.9K output

✓ Harbor trial scored: 78.50

✓ result.json and verifier artifacts saved

Recent blog posts

Read more posts

Leaderboard

Updated 2026-05-13

Algorithmic Track

172 problems

Rank Model Score@1 Avg@5 Score@5 Elo
1 gemini-3.0-pro 33.12 34.58 56.09 1265
2 gpt-5.2-thinking 32.40 33.11 47.19 1242
3 gpt-5-thinking 23.10 22.58 39.73 1196
4 deepseek-3.2 24.83 23.89 41.44 1193
5 grok-4 24.04 22.98 36.81 1174
6 gemini-2.5-pro 20.34 19.32 36.65 1167
7 gpt-5.1-thinking 20.64 21.49 34.76 1164

Human reference: 86.99 (Score@1)

Research Track

68 problems

Rank Model Score@1 Avg@5 Score@5 Elo
1 gemini-3.0-pro 46.55 43.14 59.22 1283
2 gpt-5-thinking 30.91 34.94 55.25 1218
3 gpt-5.1-thinking 32.12 33.70 56.79 1214
4 gpt-5.2-thinking 30.29 34.09 58.90 1210
5 gemini-2.5-pro 21.66 25.74 51.57 1180
6 grok-4 26.75 24.01 48.15 1149
7 deepseek-3.2 21.51 21.76 44.41 1146

Agent Track

178 tasks

Rank Agent Score Avg Steps Avg Tools Avg Tokens
1 Kimi K2.6 46.9 67.2 70.6 155.6K
2 Claude Code Opus 4.7 43.0 77.2 42.2 251K

Preview Harbor runs with a 5-hour timeout per task. See the release blog for trace-level analysis.

View benchmark on GitHub

Example tasks

Frontier-CS tasks are open-ended, verifiable optimization challenges: agents can inspect the task, iterate on submissions, and climb a continuous score instead of passing a single hidden test.

Browse tasks →

100+ Contributors from

Academic institutions

UC Berkeley logo

UC Berkeley

Princeton University logo

Princeton

Stanford University logo

Stanford

MIT logo

MIT

UCSD logo

UCSD

University of Washington logo

UW

Georgia Tech logo

Georgia Tech

University of Michigan logo

Michigan

New York University logo

NYU

UIUC logo

UIUC

University of Toronto logo

Toronto

Nanyang Technological University logo

NTU

Follow our work