Apr 22, 2026·1 min readApplications & Use Cases

Production AI fact-checker uses two-stage pipeline to gate model calls

Ajay Mahadeven details the architecture of a production-grade AI fact-checker built at Economic Data Sciences, centering on a guardrail classifier that screens inputs before a more expensive analyzer ever runs.

Dev.to #llm·Ajay Mahadeven

Read at source

Composite

6.2

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Adopt the classifier-as-architectural-gate pattern in your own agentic pipelines to cut costs, improve output quality, and block harmful inputs before they reach expensive or capable models.

01The fact-checker accepts five input types: text, PDF, CSV, DOCX, and Markdown, returning verdicts of TRUE / FALSE / DISPUTED / UNVERIFIABLE with a confidence score and cited sources.
02A two-stage AI pipeline runs a guardrail classifier before the fact-check analyzer, capping the system at two AI calls maximum per request.
03The classifier runs at temperature 0 with a max of 100 tokens, making it roughly 10x cheaper than the analyzer call it gates.

Summary— our read of the original

Ajay Mahadeven of Economic Data Sciences presents the architecture of a production AI fact-checker, framing it around a central question every AI engineer must eventually answer: should a given input go to the AI model at all? The system accepts five input types — text, PDF, CSV, DOCX, and Markdown — and returns a structured verdict (TRUE / FALSE / DISPUTED / UNVERIFIABLE), a confidence score, reasoning, and cited sources with credibility ratings.

Every input is first normalized and hashed (SHA256), then checked against a DB cache — a cache hit returns immediately with zero AI calls.

The pipeline enforces a strict two-call maximum. Every input is first normalized and hashed (SHA256), then checked against a DB cache — a cache hit returns immediately with zero AI calls. On a miss, a spending guard queries actual token costs from the database before proceeding. If the monthly USD cap has not been reached, the guardrail classifier runs as the first AI call: a tightly scoped prompt that categorizes inputs as VALID claims or one of several INVALID categories (INFORMATIONAL, OPINION, MATH, IRRELEVANT, HARMFUL). Invalid inputs are rejected immediately and stored as training data; only valid claims advance to the fact-check analyzer as the second AI call. The asymmetry is intentional — the classifier costs roughly 10x fewer tokens than the analyzer, so stopping garbage input early saves money, improves output quality, and prevents harmful content from reaching the more capable model.

Beyond the pipeline, the system abstracts across three cloud providers (Azure AI Foundry, AWS Bedrock, GCP Vertex AI) switchable via a single environment variable, and supports primary, round-robin, and fallback model rotation strategies. Uploaded files are extracted, analyzed, and deleted with no retention. An MCP server exposes the entire pipeline as a callable tool for Claude Desktop and Claude Code. Every token from every AI call is stored in the database, and the spending guard queries these real numbers — not estimates or request counts — before each call to enforce the monthly cap.

Key facts

01The fact-checker accepts five input types: text, PDF, CSV, DOCX, and Markdown, returning verdicts of TRUE / FALSE / DISPUTED / UNVERIFIABLE with a confidence score and cited sources.
02A two-stage AI pipeline runs a guardrail classifier before the fact-check analyzer, capping the system at two AI calls maximum per request.
03The classifier runs at temperature 0 with a max of 100 tokens, making it roughly 10x cheaper than the analyzer call it gates.
04Invalid inputs are categorized as INFORMATIONAL, OPINION, MATH, IRRELEVANT, or HARMFUL and rejected before the analyzer runs.
05DB-level caching uses SHA256 hashing of normalized claims; a cache hit returns immediately with zero AI calls.
06A spending guard queries actual stored token costs from the database before every AI call to enforce a monthly USD cap.
07The system abstracts across Azure AI Foundry, AWS Bedrock, and GCP Vertex AI, switchable via a single environment variable, and exposes the full pipeline via an MCP server for Claude Desktop and Claude Code.

Topics

#ai-applications #production-systems #llm-pipelines #cost-optimization #system-design

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 22, 2026 · 19:13 UTC. How this works →

Apr 22, 2026·1 min readApplications & Use Cases

Production AI fact-checker uses two-stage pipeline to gate model calls

Dev.to #llm·Ajay Mahadeven

Read at source

Composite

6.2

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Adopt the classifier-as-architectural-gate pattern in your own agentic pipelines to cut costs, improve output quality, and block harmful inputs before they reach expensive or capable models.

01The fact-checker accepts five input types: text, PDF, CSV, DOCX, and Markdown, returning verdicts of TRUE / FALSE / DISPUTED / UNVERIFIABLE with a confidence score and cited sources.
02A two-stage AI pipeline runs a guardrail classifier before the fact-check analyzer, capping the system at two AI calls maximum per request.
03The classifier runs at temperature 0 with a max of 100 tokens, making it roughly 10x cheaper than the analyzer call it gates.

Summary— our read of the original

Every input is first normalized and hashed (SHA256), then checked against a DB cache — a cache hit returns immediately with zero AI calls.

Key facts

01The fact-checker accepts five input types: text, PDF, CSV, DOCX, and Markdown, returning verdicts of TRUE / FALSE / DISPUTED / UNVERIFIABLE with a confidence score and cited sources.
02A two-stage AI pipeline runs a guardrail classifier before the fact-check analyzer, capping the system at two AI calls maximum per request.
03The classifier runs at temperature 0 with a max of 100 tokens, making it roughly 10x cheaper than the analyzer call it gates.
04Invalid inputs are categorized as INFORMATIONAL, OPINION, MATH, IRRELEVANT, or HARMFUL and rejected before the analyzer runs.
05DB-level caching uses SHA256 hashing of normalized claims; a cache hit returns immediately with zero AI calls.
06A spending guard queries actual stored token costs from the database before every AI call to enforce a monthly USD cap.
07The system abstracts across Azure AI Foundry, AWS Bedrock, and GCP Vertex AI, switchable via a single environment variable, and exposes the full pipeline via an MCP server for Claude Desktop and Claude Code.

Topics

#ai-applications #production-systems #llm-pipelines #cost-optimization #system-design

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics