DECON

The Problem

An LLM asked to verify itself will confirm what it already said.

When you ask an LLM to verify its own output, it confirms what it already said. Single-model pipelines share blind spots. Hallucinations are syntactically indistinguishable from accurate outputs — and confidence scores don't help. The only way to surface fabrication is to challenge claims externally with independent evidence.

The Solution

A 6-stage claim verification pipeline grounded in Popperian falsificationism. Extracts claims, subjects each to adversarial disconfirmation under epistemic isolation using a separate model family, verifies citations against external academic databases, reconciles findings with tiered evidence quality, and generates a self-contained analysis report — stress-testing what survives scrutiny rather than confirming what sounds right.

Pipeline Architecture

Four stages. Linear. No loops. No retries until it looks right.

DECON runs once and reports honestly on what it found. Each stage checkpoints via Lambda Durable Functions before passing to the next.

Text Input

Generate

Extract

Disconfirm

Verify Citations

Reconcile

Report

Stage 1 — Generate (Conditional)

Text Generation

If the user submits a query instead of pre-existing text, Claude Sonnet 4.6 generates a response for analysis. If text is provided directly, this stage is skipped entirely. Temperature 0.7 for natural output.

Stage 2 — Claim Extraction

Atomic Claim Isolation

Claude Haiku 4.5 reads the full text and extracts every discrete factual claim as a neutrally-framed atomic statement. Each claim is typed (factual, statistical, temporal, causal, comparative, existential, or attributed assertion) and stripped of hedging, framing, and confidence language. Temperature 0.0 for deterministic extraction.

Stage 3 — Adversarial Disconfirmation

Epistemically Isolated Challenge

DeepSeek V3.2 (via AWS Bedrock) receives ONLY claim_id, claim_text, and claim_type for each claim. No original text, no generator reasoning, no surrounding context. Claims are processed in batches of 5 to stay within DeepSeek's 8,192-token output limit. Each claim receives a verdict, reasoning chain, counterevidence with quality tier, entity verification status, and recommendation.

Stage 4 — Citation Verification

Retrieval-Backed Ground Truth

Citations produced by the disconfirmer are verified against external academic databases. CrossRef (primary) and Semantic Scholar (fallback) are queried for each unique citation by DOI, author-title match, or fuzzy search. Each citation is classified as confirmed, partial match, or unverified. Results are injected back into the disconfirmation report before reconciliation.

Stage 5 — Reconciliation

Final Arbiter

Claude Sonnet 4.6 receives the original text and the citation-enriched disconfirmation report. It weighs counterevidence by quality tier, cross-references citation verification status, identifies cases where the disconfirmer may have erred, performs pattern synthesis across all findings, and produces an executive summary, per-claim results, corrections needed, and overall confidence score.

Stage 6 — Report Generation

Formatted Output

Claude Sonnet 4.6 transforms the reconciliation output into a self-contained HTML report with executive summary, per-claim analysis with citation verification badges, evidence quality tiers, and APA-formatted references. Reports are stored in S3 and accessible via the client portal.

Model Stack

Three models. Two API routes. Deliberate epistemic diversity.

Claude models are accessed via the Anthropic API directly. DeepSeek is accessed via AWS Bedrock's Converse API. The routing logic detects the model ID and dispatches to the appropriate API.

Extraction

ModelClaude Haiku 4.5

APIAnthropic direct

Temperature0.0

Max tokens4,096

Disconfirmation

ModelDeepSeek V3.2

APIAWS Bedrock

Temperature0.3

Max tokens8,192

Batch size5 claims

Citation Verification

SourcesCrossRef + Semantic Scholar

APIREST (external)

Politeness delay0.2s per request

Verification tiersConfirmed / Partial / Unverified

Reconciliation

ModelClaude Sonnet 4.6

APIAnthropic direct

Temperature0.2

Max tokens64,000

Report Generation

ModelClaude Sonnet 4.6

APIAnthropic direct

Temperature0.2

OutputSelf-contained HTML

Generation (Optional)

ModelClaude Sonnet 4.6

APIAnthropic direct

Temperature0.7

Max tokens4,096

Infrastructure

Serverless. Durable. No orchestration overhead.

DECON uses Lambda Durable Functions rather than Step Functions. The entire pipeline runs as readable Python code with automatic checkpointing — not ASL JSON state machine definitions.

Compute

Router LambdaPython 3.14, arm64, 256MB

Light Engine LambdaPython 3.14, arm64, 512MB

Durable executionEnabled (900s timeout)

Retention7 days

Data

Job storageDynamoDB (PAY_PER_REQUEST)

Primary keyjob_id (UUID)

TTL30-day expiry

SecretsAWS Secrets Manager

API

GatewayAPI Gateway (REST)

POST/decon/light

GET/decon/status/{job_id}

CORSEnabled (*)

Deployment

IaCSAM (CloudFormation)

Stackdecon-stack

Regionus-east-1

CDNCloudFront

Heavy Engine (Designed)

6-stage pipeline for PDF analysis with dual disconfirmers.

The Heavy Engine extends the Light pipeline for PDF document analysis. Designed and coded but not yet deployed. When the two independent disconfirmers disagree, the disagreement is flagged as a finding in its own right.

Stage 1

Document Parsing

Haiku 4.5 parses uploaded PDF documents into structured text for claim extraction.

Stage 2

Prioritized Extraction

Sonnet 4.6 extracts claims with a 0–50 priority scoring matrix across relevance, verifiability, impact, uniqueness, and dependencies.

Stage 3–4

Dual Disconfirmation

Primary: Claude Opus 4.6 for Priority 1–2 claims. Secondary: DeepSeek V3.2, independent of the primary. Disagreements flagged prominently.

Stage 5–6

Reconciliation & Report

Sonnet 4.6 reconciles with disagreement tracking. Final stage generates a formatted HTML report for downloadable output.

An LLM asked to verify itself will confirm what it already said.

Four stages. Linear. No loops. No retries until it looks right.

Text Generation

Atomic Claim Isolation

Epistemically Isolated Challenge

Retrieval-Backed Ground Truth

Final Arbiter

Formatted Output

Three models. Two API routes. Deliberate epistemic diversity.

Extraction

Disconfirmation

Citation Verification

Reconciliation

Report Generation

Generation (Optional)

Tested against thousands of inputs with human-labeled ground truth.

Serverless. Durable. No orchestration overhead.

Compute

Data

API

Deployment

Latency scales linearly with claim count.

Latency (20 claims, 4 batches)

6-stage pipeline for PDF analysis with dual disconfirmers.

Document Parsing

Prioritized Extraction

Dual Disconfirmation

Reconciliation & Report

Ready to learn more?

Get in Touch

Message sent.

Download Document

Starting download…