8 min read

AI Source Attribution

How EvidAI traces every AI-extracted value back to the original source

AI Source Attribution

Every value extracted by EvidAI's AI can be traced back to its exact source in the original document. This transparency is essential for research integrity and regulatory compliance.


The Trust Problem

When AI extracts data from research papers, a critical question emerges: How do you know it's accurate?

Traditional AI ToolsEvidAI
"Sample size: 342""Sample size: 342"
Trust usClick to see source →
No citationPage 3, Paragraph 2: "A total of 342 patients were randomized..."
No verificationOne-click PDF highlight

How Source Attribution Works

Extraction with Attribution

EXTRACTED DATA: Study PMD-29847362
═══════════════════════════════════════════════════════════════

Study Design: Randomized Controlled Trial
├── Confidence: 99%
├── Source: Methods, Page 4, Line 3
├── Quote: "This randomized, double-blind, placebo-controlled trial..."
└── 📄 [View in PDF]

Sample Size: 4,744 patients
├── Confidence: 99%
├── Source: Results, Page 7, Paragraph 1
├── Quote: "A total of 4744 patients underwent randomization"
└── 📄 [View in PDF]

Intervention: Dapagliflozin 10mg daily
├── Confidence: 98%
├── Source: Methods, Page 5
├── Quote: "Patients received dapagliflozin (10 mg once daily) or matching placebo"
└── 📄 [View in PDF]

Primary Outcome: CV death or HF hospitalization
├── Confidence: 97%
├── Source: Methods, Page 5-6
├── Quote: "The primary outcome was a composite of worsening heart failure...
│           or death from cardiovascular causes"
└── 📄 [View in PDF]

Primary Result: HR 0.74 [0.65-0.85]
├── Confidence: 96%
├── Source: Results, Table 2
├── Quote: [Cell B4: "0.74 (0.65-0.85)"]
└── 📄 [View in PDF - Cell Highlighted]

⚠️ NEEDS VERIFICATION:
Follow-up Duration
├── Confidence: 72%
├── Source: Multiple locations
├── Issue: "Median 18.2 months" in abstract vs "mean 18.4 months" in results
├── AI Suggestion: Use median from abstract for consistency with other trials
└── Action Required: Human verification

═══════════════════════════════════════════════════════════════

Visual Highlighting

PDF Viewer Integration

When you click "View in PDF", the source is highlighted:

┌─────────────────────────────────────────────────────────────────┐
│  📄 DAPA-HF_trial.pdf                              Page 7 of 12 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Results                                                        │
│                                                                 │
│  Between April 2017 and June 2019, we enrolled 4744             │
│  patients at 410 sites in 20 countries. █████████████████████   │
│  █████████████████████████████████████████████████████████████  │
│  │ A total of 4744 patients underwent randomization; 2373 were │ │
│  │ assigned to receive dapagliflozin and 2371 to receive      │ │
│  │ placebo.                                                    │ │
│  █████████████████████████████████████████████████████████████  │
│                                                                 │
│  The baseline characteristics of the patients were similar      │
│  in the two groups (Table 1). The mean age was 66.3 years,     │
│  and 76.6% were men...                                          │
│                                                                 │
│                                                                 │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │ This text was cited as the source for:                    │ │
│  │ Field: Sample Size                                         │ │
│  │ Value: 4,744 patients                                      │ │
│  │ Confidence: 99%                                            │ │
│  │                                                            │ │
│  │ [Accept Value] [Edit Value] [Flag Issue]                  │ │
│  └───────────────────────────────────────────────────────────┘ │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Confidence Levels Explained

What Confidence Means

ConfidenceMeaningAction
95-100%AI highly certain, clear sourceAuto-accept available
85-94%AI confident, source identifiedReview recommended
70-84%Some uncertaintyHuman verification required
<70%Significant uncertaintyManual extraction suggested

Why Confidence Varies

FactorHigher ConfidenceLower Confidence
Source clarityExplicit statementImplied or calculated
Document qualityClean PDF, good OCRScanned, poor quality
ConsistencySame value throughoutConflicting values
LocationAbstract, tablesSupplementary, figures

Table and Figure Extraction

Handling Structured Data

TABLE EXTRACTION: Table 2 - Primary and Secondary Outcomes

Extracted with cell-level attribution:

│ Outcome                    │ Dapagliflozin │ Placebo │ HR (95% CI)      │
│─────────────────────────────────────────────────────────────────────────│
│ Primary composite          │ 386 (16.3%)   │ 502 (21.2%) │ 0.74 (0.65-0.85) │
│ [Cell A2: 99%] [B2: 99%] [C2: 99%] [D2: 96%]                           │
│─────────────────────────────────────────────────────────────────────────│
│ CV death                   │ 227 (9.6%)    │ 273 (11.5%) │ 0.82 (0.69-0.98) │
│ [Cell A3: 98%] [B3: 98%] [C3: 98%] [D3: 94%]                           │
│─────────────────────────────────────────────────────────────────────────│

Click any cell to see it highlighted in the original PDF table.

Audit Trail Integration

Every Attribution is Logged

Source attribution feeds directly into audit trails:

AUDIT LOG: Data Extraction Event

Timestamp: 2024-12-22 10:34:17 UTC
User: AI Extraction Agent
Study: PMD-29847362

Extracted Field: Primary Outcome HR
├── Value: 0.74
├── Confidence: 96%
├── Source Document: DAPA-HF_trial.pdf
├── Source Location: Table 2, Row 2, Column 4
├── Source Text: "0.74 (0.65-0.85)"
├── Extraction Method: Table parsing + NER
└── Verification Status: Pending human review

Human Verification: 2024-12-22 11:15:44 UTC
├── Reviewer: Dr. Sarah Smith
├── Action: CONFIRMED
├── Note: "Verified against PDF. Value correct."
└── Signature: sha256:a7b3c9d4...

Benefits

For Research Integrity

  • Verifiability: Any reviewer can check AI's work
  • Reproducibility: Same extraction, same sources, same results
  • Error Detection: Catch mistakes before they propagate

For Regulatory Submissions

  • Transparency: Show auditors exactly where data came from
  • Defensibility: Support every extracted value with primary source
  • Compliance: Meet documentation requirements automatically

For Efficiency

  • Targeted Review: Only verify low-confidence extractions
  • Quick Checks: One click to see source context
  • Batch Verification: Review multiple extractions simultaneously

Industry First: EvidAI is the only platform providing complete source attribution for AI-extracted data, with integrated PDF highlighting and audit trail integration.

Did this article help?
Still stuck?