About TrialDesignBench

TrialDesignBench evaluates AI agents on realistic clinical trial design tasks where small mistakes can change scientific validity, patient safety, cost, or regulatory readiness.

The benchmark is designed as an open starting point for measuring end-to-end agent performance across protocol reasoning, statistical design, operational constraints, safety considerations, and documentation quality.

Goals

  • Measure complete trial design workflows rather than isolated question answering.
  • Use tasks with clear success criteria and auditable grading.
  • Keep evaluation environments reproducible across models and agent scaffolds.
  • Give researchers a shared reference point for progress in high-stakes biomedical AI.

Status

This website is an initial content scaffold. The task set, evaluation harness, paper, dataset, and leaderboard pages are placeholders for the TrialDesignBench team to populate as the benchmark matures.

GitHub Issues