Leaderboard
The TrialDesignBench leaderboard will report agent performance on the public benchmark suite.
Planned Metrics
- Overall task pass rate.
- Checkpoint-level pass rate.
- Clinical safety and validity flags.
- Consistency across repeated runs.
- Cost, latency, and tool-use summaries where available.
Submission Status
Leaderboard submissions are not open yet. This page will be updated with submission instructions, model results, and evaluation notes when the benchmark is released.