Claim Verification

DECON

Adversarial Disconfirmation Pipeline

An LLM asked to verify itself will confirm what it already said.

When you ask an LLM to verify its own output, it confirms what it already said. Single-model pipelines share blind spots. Hallucinations are syntactically indistinguishable from accurate outputs — and confidence scores don't help. The only way to surface fabrication is to challenge claims externally with independent evidence.

A 6-stage claim verification pipeline grounded in Popperian falsificationism. Extracts claims, subjects each to adversarial disconfirmation under epistemic isolation using a separate model family, verifies citations against external academic databases, reconciles findings with tiered evidence quality, and generates a self-contained analysis report — stress-testing what survives scrutiny rather than confirming what sounds right.

Four stages. Linear. No loops. No retries until it looks right.

DECON runs once and reports honestly on what it found. Each stage checkpoints via Lambda Durable Functions before passing to the next.

IN
Text Input
S1
Generate
S2
Extract
S3
Disconfirm
S4
Verify Citations
S5
Reconcile
S6
Report
Stage 1 — Generate (Conditional)

Text Generation

If the user submits a query instead of pre-existing text, Claude Sonnet 4.6 generates a response for analysis. If text is provided directly, this stage is skipped entirely. Temperature 0.7 for natural output.

Stage 2 — Claim Extraction

Atomic Claim Isolation

Claude Haiku 4.5 reads the full text and extracts every discrete factual claim as a neutrally-framed atomic statement. Each claim is typed (factual, statistical, temporal, causal, comparative, existential, or attributed assertion) and stripped of hedging, framing, and confidence language. Temperature 0.0 for deterministic extraction.

Stage 3 — Adversarial Disconfirmation

Epistemically Isolated Challenge

DeepSeek V3.2 (via AWS Bedrock) receives ONLY claim_id, claim_text, and claim_type for each claim. No original text, no generator reasoning, no surrounding context. Claims are processed in batches of 5 to stay within DeepSeek's 8,192-token output limit. Each claim receives a verdict, reasoning chain, counterevidence with quality tier, entity verification status, and recommendation.

Stage 4 — Citation Verification

Retrieval-Backed Ground Truth

Citations produced by the disconfirmer are verified against external academic databases. CrossRef (primary) and Semantic Scholar (fallback) are queried for each unique citation by DOI, author-title match, or fuzzy search. Each citation is classified as confirmed, partial match, or unverified. Results are injected back into the disconfirmation report before reconciliation.

Stage 5 — Reconciliation

Final Arbiter

Claude Sonnet 4.6 receives the original text and the citation-enriched disconfirmation report. It weighs counterevidence by quality tier, cross-references citation verification status, identifies cases where the disconfirmer may have erred, performs pattern synthesis across all findings, and produces an executive summary, per-claim results, corrections needed, and overall confidence score.

Stage 6 — Report Generation

Formatted Output

Claude Sonnet 4.6 transforms the reconciliation output into a self-contained HTML report with executive summary, per-claim analysis with citation verification badges, evidence quality tiers, and APA-formatted references. Reports are stored in S3 and accessible via the client portal.

Three models. Two API routes. Deliberate epistemic diversity.

Claude models are accessed via the Anthropic API directly. DeepSeek is accessed via AWS Bedrock's Converse API. The routing logic detects the model ID and dispatches to the appropriate API.

Extraction

ModelClaude Haiku 4.5
APIAnthropic direct
Temperature0.0
Max tokens4,096

Disconfirmation

ModelDeepSeek V3.2
APIAWS Bedrock
Temperature0.3
Max tokens8,192
Batch size5 claims

Citation Verification

SourcesCrossRef + Semantic Scholar
APIREST (external)
Politeness delay0.2s per request
Verification tiersConfirmed / Partial / Unverified

Reconciliation

ModelClaude Sonnet 4.6
APIAnthropic direct
Temperature0.2
Max tokens64,000

Report Generation

ModelClaude Sonnet 4.6
APIAnthropic direct
Temperature0.2
OutputSelf-contained HTML

Generation (Optional)

ModelClaude Sonnet 4.6
APIAnthropic direct
Temperature0.7
Max tokens4,096

Tested against thousands of inputs with human-labeled ground truth.

Evaluation suite across three categories: supported text, fabricated claims, and mixed content containing both real and fabricated claims.

83%
Fabricated content detection (hard detection)
91%
Mixed content detection
0
False positives on supported text
Thousands
Evaluation inputs scored against ground truth

Serverless. Durable. No orchestration overhead.

DECON uses Lambda Durable Functions rather than Step Functions. The entire pipeline runs as readable Python code with automatic checkpointing — not ASL JSON state machine definitions.

Compute

Router LambdaPython 3.14, arm64, 256MB
Light Engine LambdaPython 3.14, arm64, 512MB
Durable executionEnabled (900s timeout)
Retention7 days

Data

Job storageDynamoDB (PAY_PER_REQUEST)
Primary keyjob_id (UUID)
TTL30-day expiry
SecretsAWS Secrets Manager

API

GatewayAPI Gateway (REST)
POST/decon/light
GET/decon/status/{job_id}
CORSEnabled (*)

Deployment

IaCSAM (CloudFormation)
Stackdecon-stack
Regionus-east-1
CDNCloudFront

Latency scales linearly with claim count.

Latency (20 claims, 4 batches)

Total duration~440 seconds
Generation~25 seconds
Extraction~13 seconds
Disconfirmation (4 batches)~215 seconds
Citation verification~5 seconds
Reconciliation~90 seconds
Report generation~94 seconds

6-stage pipeline for PDF analysis with dual disconfirmers.

The Heavy Engine extends the Light pipeline for PDF document analysis. Designed and coded but not yet deployed. When the two independent disconfirmers disagree, the disagreement is flagged as a finding in its own right.

Stage 1

Document Parsing

Haiku 4.5 parses uploaded PDF documents into structured text for claim extraction.

Stage 2

Prioritized Extraction

Sonnet 4.6 extracts claims with a 0–50 priority scoring matrix across relevance, verifiability, impact, uniqueness, and dependencies.

Stage 3–4

Dual Disconfirmation

Primary: Claude Opus 4.6 for Priority 1–2 claims. Secondary: DeepSeek V3.2, independent of the primary. Disagreements flagged prominently.

Stage 5–6

Reconciliation & Report

Sonnet 4.6 reconciles with disagreement tracking. Final stage generates a formatted HTML report for downloadable output.

Ready to learn more?

Discuss how DECON can be integrated into your content workflow — whether you're validating AI-generated reports, auditing research, or building trust into published outputs.

Get in Touch

We'll respond within two business days.

Message sent.

We'll be in touch shortly.

Download Document

Please provide the following to access the document.

Starting download…

If the download does not begin automatically, click here.