Context engineering shapes what agentic AI systems know and cost
Kacper Łukawski argues that context engineering — deciding what to include, how much, and when to omit — is becoming a critical discipline for building reliable and cost-efficient agentic AI systems.
Score breakdown
Developers building agentic pipelines should treat the context window as a finite budget — actively pruning, summarizing, and prioritizing what enters it to avoid compounding token costs and degraded reasoning across multi-step loops.
- 01Author Kacper Łukawski argues that larger context windows do not reliably produce better answers, and often increase costs and slow responses.
- 02LLMs are stateless, so agentic systems re-send the entire accumulated history from scratch on every iteration, causing context to grow like a log.
- 03The Transformer attention mechanism spreads model capacity across all tokens simultaneously, meaning irrelevant content actively competes with useful information.
Kacper Łukawski's article on Dev.to opens with a counterintuitive premise: the fact that modern LLMs offer million-token context windows is not an invitation to use them fully. An LLM has only two sources of information — its static training knowledge and whatever is passed at inference time — and context is the only lever developers can actually control. The problem is that agentic systems burn through that lever fast. A system prompt, tool definitions, tool call results, retrieved documents, and a few conversation turns can consume tens of thousands of tokens before an agent has done anything meaningful.
The core technical argument rests on the Transformer attention mechanism: every token attends to every other token, meaning the model's capacity is spread across all tokens simultaneously.
The core technical argument rests on the Transformer attention mechanism: every token attends to every other token, meaning the model's capacity is spread across all tokens simultaneously. Łukawski describes this as an "attention budget" — irrelevant or redundant content doesn't just waste space, it actively competes with information that matters. He cites Anthropic research confirming that models show reduced precision for information retrieval and long-range reasoning at longer contexts compared to shorter ones. There is also a direct cost dimension: most hosted LLMs charge per input token, and a bloated context in an agentic loop that runs dozens of iterations compounds that cost multiplicatively.
The article then catalogs the full set of components that fill an agent's context window: system prompts, conversation history, memory retrieved from past sessions or external stores, RAG pipeline output, and tool definitions. Łukawski positions context engineering — deciding not just what to include but how much, in what form, and what to leave out — as one of the most important emerging skills for practitioners building agentic systems.
Key facts
- 01Author Kacper Łukawski argues that larger context windows do not reliably produce better answers, and often increase costs and slow responses.
- 02LLMs are stateless, so agentic systems re-send the entire accumulated history from scratch on every iteration, causing context to grow like a log.
- 03The Transformer attention mechanism spreads model capacity across all tokens simultaneously, meaning irrelevant content actively competes with useful information.
- 04Anthropic research cited in the article shows models have reduced precision for information retrieval and long-range reasoning at longer contexts.