AI Source Attribution
Every value extracted by EvidAI's AI can be traced back to its exact source in the original document. This transparency is essential for research integrity and regulatory compliance.
The Trust Problem
When AI extracts data from research papers, a critical question emerges: How do you know it's accurate?
| Traditional AI Tools | EvidAI |
|---|---|
| "Sample size: 342" | "Sample size: 342" |
| Trust us | Click to see source → |
| No citation | Page 3, Paragraph 2: "A total of 342 patients were randomized..." |
| No verification | One-click PDF highlight |
How Source Attribution Works
Extraction with Attribution
EXTRACTED DATA: Study PMD-29847362
═══════════════════════════════════════════════════════════════
Study Design: Randomized Controlled Trial
├── Confidence: 99%
├── Source: Methods, Page 4, Line 3
├── Quote: "This randomized, double-blind, placebo-controlled trial..."
└── 📄 [View in PDF]
Sample Size: 4,744 patients
├── Confidence: 99%
├── Source: Results, Page 7, Paragraph 1
├── Quote: "A total of 4744 patients underwent randomization"
└── 📄 [View in PDF]
Intervention: Dapagliflozin 10mg daily
├── Confidence: 98%
├── Source: Methods, Page 5
├── Quote: "Patients received dapagliflozin (10 mg once daily) or matching placebo"
└── 📄 [View in PDF]
Primary Outcome: CV death or HF hospitalization
├── Confidence: 97%
├── Source: Methods, Page 5-6
├── Quote: "The primary outcome was a composite of worsening heart failure...
│ or death from cardiovascular causes"
└── 📄 [View in PDF]
Primary Result: HR 0.74 [0.65-0.85]
├── Confidence: 96%
├── Source: Results, Table 2
├── Quote: [Cell B4: "0.74 (0.65-0.85)"]
└── 📄 [View in PDF - Cell Highlighted]
⚠️ NEEDS VERIFICATION:
Follow-up Duration
├── Confidence: 72%
├── Source: Multiple locations
├── Issue: "Median 18.2 months" in abstract vs "mean 18.4 months" in results
├── AI Suggestion: Use median from abstract for consistency with other trials
└── Action Required: Human verification
═══════════════════════════════════════════════════════════════
Visual Highlighting
PDF Viewer Integration
When you click "View in PDF", the source is highlighted:
┌─────────────────────────────────────────────────────────────────┐
│ 📄 DAPA-HF_trial.pdf Page 7 of 12 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Results │
│ │
│ Between April 2017 and June 2019, we enrolled 4744 │
│ patients at 410 sites in 20 countries. █████████████████████ │
│ █████████████████████████████████████████████████████████████ │
│ │ A total of 4744 patients underwent randomization; 2373 were │ │
│ │ assigned to receive dapagliflozin and 2371 to receive │ │
│ │ placebo. │ │
│ █████████████████████████████████████████████████████████████ │
│ │
│ The baseline characteristics of the patients were similar │
│ in the two groups (Table 1). The mean age was 66.3 years, │
│ and 76.6% were men... │
│ │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ This text was cited as the source for: │ │
│ │ Field: Sample Size │ │
│ │ Value: 4,744 patients │ │
│ │ Confidence: 99% │ │
│ │ │ │
│ │ [Accept Value] [Edit Value] [Flag Issue] │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Confidence Levels Explained
What Confidence Means
| Confidence | Meaning | Action |
|---|---|---|
| 95-100% | AI highly certain, clear source | Auto-accept available |
| 85-94% | AI confident, source identified | Review recommended |
| 70-84% | Some uncertainty | Human verification required |
| <70% | Significant uncertainty | Manual extraction suggested |
Why Confidence Varies
| Factor | Higher Confidence | Lower Confidence |
|---|---|---|
| Source clarity | Explicit statement | Implied or calculated |
| Document quality | Clean PDF, good OCR | Scanned, poor quality |
| Consistency | Same value throughout | Conflicting values |
| Location | Abstract, tables | Supplementary, figures |
Table and Figure Extraction
Handling Structured Data
TABLE EXTRACTION: Table 2 - Primary and Secondary Outcomes
Extracted with cell-level attribution:
│ Outcome │ Dapagliflozin │ Placebo │ HR (95% CI) │
│─────────────────────────────────────────────────────────────────────────│
│ Primary composite │ 386 (16.3%) │ 502 (21.2%) │ 0.74 (0.65-0.85) │
│ [Cell A2: 99%] [B2: 99%] [C2: 99%] [D2: 96%] │
│─────────────────────────────────────────────────────────────────────────│
│ CV death │ 227 (9.6%) │ 273 (11.5%) │ 0.82 (0.69-0.98) │
│ [Cell A3: 98%] [B3: 98%] [C3: 98%] [D3: 94%] │
│─────────────────────────────────────────────────────────────────────────│
Click any cell to see it highlighted in the original PDF table.
Audit Trail Integration
Every Attribution is Logged
Source attribution feeds directly into audit trails:
AUDIT LOG: Data Extraction Event
Timestamp: 2024-12-22 10:34:17 UTC
User: AI Extraction Agent
Study: PMD-29847362
Extracted Field: Primary Outcome HR
├── Value: 0.74
├── Confidence: 96%
├── Source Document: DAPA-HF_trial.pdf
├── Source Location: Table 2, Row 2, Column 4
├── Source Text: "0.74 (0.65-0.85)"
├── Extraction Method: Table parsing + NER
└── Verification Status: Pending human review
Human Verification: 2024-12-22 11:15:44 UTC
├── Reviewer: Dr. Sarah Smith
├── Action: CONFIRMED
├── Note: "Verified against PDF. Value correct."
└── Signature: sha256:a7b3c9d4...
Benefits
For Research Integrity
- Verifiability: Any reviewer can check AI's work
- Reproducibility: Same extraction, same sources, same results
- Error Detection: Catch mistakes before they propagate
For Regulatory Submissions
- Transparency: Show auditors exactly where data came from
- Defensibility: Support every extracted value with primary source
- Compliance: Meet documentation requirements automatically
For Efficiency
- Targeted Review: Only verify low-confidence extractions
- Quick Checks: One click to see source context
- Batch Verification: Review multiple extractions simultaneously
Industry First: EvidAI is the only platform providing complete source attribution for AI-extracted data, with integrated PDF highlighting and audit trail integration.