Cost Optimization

Last verified: 2026-04-17 · next review in 28 days

LLM API costs can grow quickly in production. Here are proven strategies to reduce them.

The Five Levers

1. Model Routing (40-70% savings)

Use cheap models for simple tasks, expensive models for quality tasks:

Task	Model	Cost/MTok Input
Classification, categorization	Haiku 4.5	$0.50 (batch)
Summarization, writing	Sonnet 4.6	$1.50 (batch)
Complex reasoning	Opus 4.7	$2.50 (batch)
Binary yes/no filtering	GPT-4.1-nano	$0.10

A classifier costs roughly $0.001 per call. Route 70% of traffic to cheap models.

2. Prompt Caching (60-90% savings)

Anthropic caches shared prompt content at 0.1x the input price:

system: [
  {
    type: 'text',
    text: 'Your system prompt + taxonomy + few-shot examples...',
    cache_control: { type: 'ephemeral' }, // Cache this block
  },
];

If your system prompt is 3,000 tokens and you process 200 articles, you pay full price once and 0.1x for 199 cached hits.

3. Batch API (50% flat discount)

Both Anthropic and OpenAI offer 50% off for async batch processing:

Submit batch of requests
Results within 24 hours (typically 5-30 minutes)
Stacks with prompt caching

Combined: 50% batch + 90% cache = up to 95% savings on repeated prompts.

4. Context Compaction (50-70% token reduction)

Strip unnecessary content before sending to LLM:

Remove HTML tags, ads, navigation
Truncate articles to first 3,000 characters
Extract only title + key paragraphs
A 2,000-word article often compresses to 800 meaningful words

5. Single-Call Multi-Output

Generate all output formats in one call:

{
  "headline": "...",
  "summary_short": "...",
  "summary_medium": "...",
  "category": "...",
  "scores": { ... }
}

One call with structured output is 3x cheaper than three separate calls.

Real-World Cost Example

Illustrative optimization path for 100 articles/day:

Optimization	Daily Cost (illustrative)
Naive (Sonnet for everything, no caching)	~$7.80
+ Batch API (50% off)	~$3.90
+ Prompt caching (90% on system prompt)	~$2.50
+ Model routing (Haiku for categorization)	~$1.50
+ Content compaction	~$1.00

These numbers show the shape of the savings, not audited production costs. The Agentic Universe pipeline's measured per-scenario numbers live in docs/architecture-notes.md → Cost Estimates — the Standard (100 articles/day) scenario runs ~$3.50/day today because ingestion volume varies and not every optimization above is fully deployed.

Treat this table as a savings ladder to climb, not as a benchmark to hit.

Tools for Tracking

Anthropic Dashboard — usage and cost per model (per-API-key breakdown)
Custom pipeline tracking — log tokens per stage in your database (what Agentic Universe does)
Budget alerts — set daily/monthly spend limits via provider console
Langfuse — open-source LLM observability (self-hosted or cloud); traces, token accounting, eval
Helicone — proxy-based observability; transparent per-request logging with cache/cost headers
PromptLayer — prompt versioning + cost tracking + eval; strong for iteration-heavy workflows

On this page

Cost Optimization

The Five Levers

1. Model Routing (40-70% savings)

2. Prompt Caching (60-90% savings)

3. Batch API (50% flat discount)

4. Context Compaction (50-70% token reduction)

5. Single-Call Multi-Output

Real-World Cost Example

Tools for Tracking

On this page

On this page

Cost Optimization

The Five Levers

1. Model Routing (40-70% savings)

2. Prompt Caching (60-90% savings)

3. Batch API (50% flat discount)

4. Context Compaction (50-70% token reduction)

5. Single-Call Multi-Output

Real-World Cost Example

Tools for Tracking

On this page