Every agent has a context window — the total text it can "see" in one turn. Models in 2026 commonly ship with 200k to 1M tokens, but how much you use matters more than the ceiling. LLM performance degrades as context fills up. Managing context is the single highest-leverage skill in agentic coding.
/clear resets the window.On a typical Claude Code session, the context absorbs:
pnpm test --verbose can be thousands of tokensA 30-minute debugging session on a moderate codebase can consume 100k+ tokens. Once you're past ~50% of the window, quality degrades.
@-imports in CLAUDE.mdDon't inline large docs in CLAUDE.md. Reference them:
# My Project
See @README.md for project overview.
See @docs/architecture.md for system design.
## Code style
- TypeScript strict, no `any`
- …The agent loads the imported file only when it needs to read it.
Sub-agents run in separate context windows and report summaries back:
You (main session): "Use a sub-agent to investigate how we handle
auth across the codebase. Return a summary of
the files touched and the session flow."
Sub-agent: [reads 40 files in its own context]
↓
[returns a 500-token summary]
You (main session): still clean.This is the single biggest win for long sessions. Claude Code, Claude Agent SDK, and OpenAI Agents SDK all support sub-agents; LangGraph models them as sub-graphs.
/clear and /compact/clear — wipes the conversation. Start fresh when switching tasks./compact — summarizes the session so far into a shorter form and continues. Use when you want to keep going on the same task but context is getting heavy./compact <instructions> — direct the summarization: "focus on the API changes" keeps what matters, drops what doesn't.Run /clear between unrelated tasks. Don't wait for auto-compact.
When asking the agent to inspect a specific function:
Bad: "Read src/services/pipeline/index.ts and tell me…"
→ loads the whole file
Good: "Read src/services/pipeline/index.ts lines 120-180 and tell me…"
→ loads ~2k tokensMost agent tools (Claude Code's Read) accept offset + limit. Use them.
Every active MCP server adds tool-schema tokens at session start. A dozen active servers can be 10k+ tokens before you've said anything. Cap at 5–6, and enable more only when the task needs them.
Claude Code: the status bar shows approximate context fill. Also: /usage surfaces current token accounting. See the custom status-line docs for richer tracking.
For agents you're building with the Agent SDK: log response.usage.input_tokens per turn and alert if it exceeds a threshold.
You'll see the agent:
Don't push through. Run /compact with specific instructions, or /clear and start a fresh session with a tighter prompt that captures what you learned.
/clear between unrelated tasks./clear more often.CLAUDE.md. A 500-line CLAUDE.md rides along in every session. Prune aggressively — Anthropic's own guidance is "ruthlessly delete lines that don't change Claude's behavior."Search for a command to run...