7 min read

Agentic Orchestration

Multi-agent debate, tree-of-thoughts, and manuscript judge loop capabilities

Agentic Orchestration

Beyond single-persona AI assistance, EvidAI supports composed agentic patterns — orchestrations that put multiple agents into structured deliberation, exploration, or iterative critique. These patterns unlock substantially higher output quality at the cost of additional latency and tokens, so they are feature-flagged and invoked selectively.

The Three Composed Patterns

1. Debate

Multiple agents argue different positions on the same question, a judge agent adjudicates, and the final answer reflects the deliberation rather than a single-model opinion.

When it helps

  • Screening calls that are genuinely ambiguous (reviewer-grade uncertainty)
  • Risk-of-bias domain judgments where the paper is borderline
  • Interpretive synthesis questions (heterogeneity drivers, effect modifiers)

How it works

  1. N named agents (e.g. Screening Analyst + Quality Assessor + Compliance Officer) each produce an independent position with reasoning.
  2. The transcript is passed to a judge agent (typically Compliance Officer or Academic Writer depending on the task).
  3. The judge produces the final answer, grounded in the shared transcript.

2. Tree of Thoughts

A single agent explores multiple thought branches in parallel, scores each branch, and picks the best path. Used when the answer involves a decision tree (e.g. "which effect measure should we pool, given this mix of outcomes and reporting conventions?").

When it helps

  • Search strategy design — explore multiple query strategies, score recall-precision tradeoff
  • Synthesis method selection — explore narrative vs meta-analytic vs NMA paths given the evidence base
  • Manuscript section drafting — explore multiple thesis framings before committing

How it works

  1. The agent generates N candidate branches for the next step.
  2. Each branch is scored (typically by self-critique or a lightweight evaluator pass).
  3. The best branch is expanded to the next depth, repeating until the budget is spent or a terminal branch is selected.

3. Manuscript Judge Loop

An iterative critique-and-rewrite loop over a manuscript section. Produces publication-grade prose at the cost of extra revision passes.

When it helps

  • Abstract, Methods, and Discussion sections that demand precision and journal-voice matching
  • HTA / regulatory sections where compliance-checklist alignment is critical

How it works

  1. Academic Writer drafts the section.
  2. Compliance Officer (or a journal-style judge) critiques against checklist items and voice.
  3. Academic Writer revises in response to the critique.
  4. Loop until the judge clears the section or the revision budget is spent.

Default Behavior

These orchestrated patterns are off by default in production and are enabled per organization. When disabled, the standard single-agent path runs as an automatic, seamless fallback — no behavior gap, just the simpler attribution model. Turning orchestration on is an organization-level decision with cost and latency implications, and is reversible at any time.

Transparency

Every orchestrated run writes one event per sub-step to the AI transparency log, so a reviewer can trace:

  • Which agents participated
  • What each one said
  • How the judge decided
  • Total token spend and wall-clock latency

These events are included in export bundles for full reproducibility.

Attribution Rules

  • Named agents only — orchestrated patterns still attribute every sub-step to one of the ten named personas, never a raw model slug.
  • Judge carries final attribution — the final answer is attributed to the judge agent, with the participant agents listed as contributors.
  • Audit trail is chain-linked — orchestrated events participate in the 21 CFR Part 11 chain, so the entire deliberation is tamper-evident.

When NOT to Use

Orchestrated patterns are genuinely more expensive. Use single-persona agents when:

  • The task is well-specified and single-perspective (e.g. extracting sample_size from a well-structured table)
  • Reviewer-grade uncertainty is low
  • Turnaround time matters more than marginal quality gains

The platform's default configuration is single-persona; orchestration is opt-in per organization and per feature flag.

Did this article help?
Still stuck?