LLM API costs can grow quickly in production. Here are proven strategies to reduce them.
Use cheap models for simple tasks, expensive models for quality tasks:
| Task | Model | Cost/MTok Input |
|---|---|---|
| Classification, categorization | Haiku 4.5 | $0.50 (batch) |
| Summarization, writing | Sonnet 4.6 | $1.50 (batch) |
| Complex reasoning | Opus 4.7 | $2.50 (batch) |
| Binary yes/no filtering | GPT-4.1-nano | $0.10 |
A classifier costs roughly $0.001 per call. Route 70% of traffic to cheap models.
Anthropic caches shared prompt content at 0.1x the input price:
system: [
{
type: 'text',
text: 'Your system prompt + taxonomy + few-shot examples...',
cache_control: { type: 'ephemeral' }, // Cache this block
},
];If your system prompt is 3,000 tokens and you process 200 articles, you pay full price once and 0.1x for 199 cached hits.
Both Anthropic and OpenAI offer 50% off for async batch processing:
Combined: 50% batch + 90% cache = up to 95% savings on repeated prompts.
Strip unnecessary content before sending to LLM:
Generate all output formats in one call:
{
"headline": "...",
"summary_short": "...",
"summary_medium": "...",
"category": "...",
"scores": { ... }
}One call with structured output is 3x cheaper than three separate calls.
Illustrative optimization path for 100 articles/day:
| Optimization | Daily Cost (illustrative) |
|---|---|
| Naive (Sonnet for everything, no caching) | ~$7.80 |
| + Batch API (50% off) | ~$3.90 |
| + Prompt caching (90% on system prompt) | ~$2.50 |
| + Model routing (Haiku for categorization) | ~$1.50 |
| + Content compaction | ~$1.00 |
These numbers show the shape of the savings, not audited production costs. The Agentic Universe pipeline's measured per-scenario numbers live in docs/architecture-notes.md → Cost Estimates — the Standard (100 articles/day) scenario runs ~$3.50/day today because ingestion volume varies and not every optimization above is fully deployed.
Treat this table as a savings ladder to climb, not as a benchmark to hit.
Search for a command to run...