Protocol Reasoning
Transform trial objectives and clinical context into coherent protocol concepts.
TrialDesignBench
TrialDesignBench measures whether AI agents can complete high-stakes trial design workflows with clinically grounded reasoning, auditable outputs, and reproducible evaluation.
TBD
Benchmark tasks
TBD
Evaluation checkpoints
TBD
Agent submissions
Agents must synthesize evidence, reason about design tradeoffs, and produce outputs that survive clinical, statistical, and safety review.
Transform trial objectives and clinical context into coherent protocol concepts.
Select endpoints, estimands, assumptions, sample sizes, and adaptive design choices.
Balance enrollment, follow-up, site capacity, cost, and timeline constraints.
Public results will appear here when the benchmark is released.
| Rank | Model | Agent | Pass Rate | Status |
|---|---|---|---|---|
| - | Pending | Pending | TBD | Not yet released |
Methodology
Tasks reflect realistic design decisions and expert review.
Rubrics capture critical reasoning, artifacts, and safety constraints.
Versioned inputs and recorded trajectories support auditability.
Unsafe or unsupported recommendations block successful completion.