AI Source Attribution

Every value extracted by EvidAI's AI can be traced back to its exact source in the original document. This transparency is essential for research integrity and regulatory compliance.

The Trust Problem

When AI extracts data from research papers, a critical question emerges: How do you know it's accurate?

Traditional AI Tools	EvidAI
"Sample size: 342"	"Sample size: 342"
Trust us	Click to see source →
No citation	Page 3, Paragraph 2: "A total of 342 patients were randomized..."
No verification	One-click PDF highlight

How Source Attribution Works

Extraction with Attribution

EXTRACTED DATA: Study PMD-29847362
═══════════════════════════════════════════════════════════════

Study Design: Randomized Controlled Trial
├── Confidence: 99%
├── Source: Methods, Page 4, Line 3
├── Quote: "This randomized, double-blind, placebo-controlled trial..."
└── 📄 [View in PDF]

Sample Size: 4,744 patients
├── Confidence: 99%
├── Source: Results, Page 7, Paragraph 1
├── Quote: "A total of 4744 patients underwent randomization"
└── 📄 [View in PDF]

Intervention: Dapagliflozin 10mg daily
├── Confidence: 98%
├── Source: Methods, Page 5
├── Quote: "Patients received dapagliflozin (10 mg once daily) or matching placebo"
└── 📄 [View in PDF]

Primary Outcome: CV death or HF hospitalization
├── Confidence: 97%
├── Source: Methods, Page 5-6
├── Quote: "The primary outcome was a composite of worsening heart failure...
│           or death from cardiovascular causes"
└── 📄 [View in PDF]

Primary Result: HR 0.74 [0.65-0.85]
├── Confidence: 96%
├── Source: Results, Table 2
├── Quote: [Cell B4: "0.74 (0.65-0.85)"]
└── 📄 [View in PDF - Cell Highlighted]

⚠️ NEEDS VERIFICATION:
Follow-up Duration
├── Confidence: 72%
├── Source: Multiple locations
├── Issue: "Median 18.2 months" in abstract vs "mean 18.4 months" in results
├── AI Suggestion: Use median from abstract for consistency with other trials
└── Action Required: Human verification

═══════════════════════════════════════════════════════════════

Visual Highlighting

PDF Viewer Integration

When you click "View in PDF", the source is highlighted:

┌─────────────────────────────────────────────────────────────────┐
│  📄 DAPA-HF_trial.pdf                              Page 7 of 12 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Results                                                        │
│                                                                 │
│  Between April 2017 and June 2019, we enrolled 4744             │
│  patients at 410 sites in 20 countries. █████████████████████   │
│  █████████████████████████████████████████████████████████████  │
│  │ A total of 4744 patients underwent randomization; 2373 were │ │
│  │ assigned to receive dapagliflozin and 2371 to receive      │ │
│  │ placebo.                                                    │ │
│  █████████████████████████████████████████████████████████████  │
│                                                                 │
│  The baseline characteristics of the patients were similar      │
│  in the two groups (Table 1). The mean age was 66.3 years,     │
│  and 76.6% were men...                                          │
│                                                                 │
│                                                                 │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │ This text was cited as the source for:                    │ │
│  │ Field: Sample Size                                         │ │
│  │ Value: 4,744 patients                                      │ │
│  │ Confidence: 99%                                            │ │
│  │                                                            │ │
│  │ [Accept Value] [Edit Value] [Flag Issue]                  │ │
│  └───────────────────────────────────────────────────────────┘ │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Confidence Levels Explained

What Confidence Means

Confidence	Meaning	Action
95-100%	AI highly certain, clear source	Auto-accept available
85-94%	AI confident, source identified	Review recommended
70-84%	Some uncertainty	Human verification required
<70%	Significant uncertainty	Manual extraction suggested

Why Confidence Varies

Factor	Higher Confidence	Lower Confidence
Source clarity	Explicit statement	Implied or calculated
Document quality	Clean PDF, good OCR	Scanned, poor quality
Consistency	Same value throughout	Conflicting values
Location	Abstract, tables	Supplementary, figures

Table and Figure Extraction

Handling Structured Data

TABLE EXTRACTION: Table 2 - Primary and Secondary Outcomes

Extracted with cell-level attribution:

│ Outcome                    │ Dapagliflozin │ Placebo │ HR (95% CI)      │
│─────────────────────────────────────────────────────────────────────────│
│ Primary composite          │ 386 (16.3%)   │ 502 (21.2%) │ 0.74 (0.65-0.85) │
│ [Cell A2: 99%] [B2: 99%] [C2: 99%] [D2: 96%]                           │
│─────────────────────────────────────────────────────────────────────────│
│ CV death                   │ 227 (9.6%)    │ 273 (11.5%) │ 0.82 (0.69-0.98) │
│ [Cell A3: 98%] [B3: 98%] [C3: 98%] [D3: 94%]                           │
│─────────────────────────────────────────────────────────────────────────│

Click any cell to see it highlighted in the original PDF table.

Audit Trail Integration

Every Attribution is Logged

Source attribution feeds directly into audit trails:

AUDIT LOG: Data Extraction Event

Timestamp: 2024-12-22 10:34:17 UTC
User: AI Extraction Agent
Study: PMD-29847362

Extracted Field: Primary Outcome HR
├── Value: 0.74
├── Confidence: 96%
├── Source Document: DAPA-HF_trial.pdf
├── Source Location: Table 2, Row 2, Column 4
├── Source Text: "0.74 (0.65-0.85)"
├── Extraction Method: Table parsing + NER
└── Verification Status: Pending human review

Human Verification: 2024-12-22 11:15:44 UTC
├── Reviewer: Dr. Sarah Smith
├── Action: CONFIRMED
├── Note: "Verified against PDF. Value correct."
└── Signature: sha256:a7b3c9d4...

Benefits

For Research Integrity

Verifiability: Any reviewer can check AI's work
Reproducibility: Same extraction, same sources, same results
Error Detection: Catch mistakes before they propagate

For Regulatory Submissions

Transparency: Show auditors exactly where data came from
Defensibility: Support every extracted value with primary source
Compliance: Meet documentation requirements automatically

For Efficiency

Targeted Review: Only verify low-confidence extractions
Quick Checks: One click to see source context
Batch Verification: Review multiple extractions simultaneously

Industry First: EvidAI is the only platform providing complete source attribution for AI-extracted data, with integrated PDF highlighting and audit trail integration.