Leaderboard - TrialDesignBench

The TrialDesignBench leaderboard will report agent performance on the public benchmark suite.

Planned Metrics

Overall task pass rate.
Checkpoint-level pass rate.
Clinical safety and validity flags.
Consistency across repeated runs.
Cost, latency, and tool-use summaries where available.

Submission Status

Leaderboard submissions are not open yet. This page will be updated with submission instructions, model results, and evaluation notes when the benchmark is released.