Compass v1.1.0 ships recall-consumption drift detection
`nautilus-compass` v1.1.0 adds a `recall_consumption.py` module that detects when an agent surfaces memory files but never actually reads their contents, closing the gap between a recall hit and genuine rule consumption.
Score breakdown
Agentic coding pipelines that rely on memory retrieval need to verify actual content consumption, not just recall hits — this release provides a concrete, low-overhead mechanism to catch that gap before it causes silent rule violations.
- 01v1.1.0 adds `recall_consumption.py`, which audits live session `Read` tool calls to detect recalled files that were never actually opened by the agent.
- 02Top-3 recall hits now embed the first 800 characters of post-frontmatter content directly, so rules appear in the agent's working context without an extra `Read` call.
- 03The consumption checker fires via `mid_session_hook` every 25 tool calls, alerting only when ≥3 files are unconsumed AND the consumption ratio is below 0.3.
The post by chunxiaoxx describes a concrete failure that motivated `nautilus-compass` v1.1.0: a Claude Code agent recalled the correct memory file (`publisher_quality_pipeline_20260430.md`, scored 0.84), saw its title and 80-character description, then proceeded to skip the actual rules inside — running `_tmp_publish_v8.cjs` scripts with no critic round and the wrong configuration. The diagnosis was that recall surfaced the right file but the agent never consumed it, a failure the post frames as structural rather than purely the agent's fault: the shape of the recall response made it easy to act on the stub instead of the full content.
To fix this, v1.1.0 embeds the first 800 characters of post-frontmatter content directly in the top-3 recall hits, so the rules land in the agent's working context without requiring an additional `Read` call.
To fix this, v1.1.0 embeds the first 800 characters of post-frontmatter content directly in the top-3 recall hits, so the rules land in the agent's working context without requiring an additional `Read` call. The new `recall_consumption.py` module walks back through the live session's `Read` tool calls to determine whether recalled files were actually opened. It is wired into the `drift_check` MCP tool (runs even when the BGE daemon is unreachable, since the audit is pure file traversal) and a `mid_session_hook` that fires every 25 tool calls, alerting only when 3 or more files are unconsumed and the consumption ratio is below 0.3. The module was tested on a 130 MB / 32k-line session where 41 recall hits were surfaced and 0 were consumed.
The drift detector's alerts are also improved: previously they reported only "matched anti-anchor X with cos=0.625"; v1.1.0 alerts now embed body text from the most relevant past lesson session. On the governance side, the new `governance_plan` MCP tool reads two file-exported registries — `agents_capabilities.json` and `anchor_packs_phases.json` — to dynamically route phases to executors by capability score, replacing the static channel-dict fan-out from v1.0.0. Eval numbers remain at the v1.0.0 locked figures from 2026-05-08: LongMemEval-S 56.6% (n=500), EverMemBench-Dynamic runs of 44.4% and 47.3%, drift detector ROC AUC 0.83, and end-to-end reproduction cost of $3.50.
Key facts
- 01v1.1.0 adds `recall_consumption.py`, which audits live session `Read` tool calls to detect recalled files that were never actually opened by the agent.
- 02Top-3 recall hits now embed the first 800 characters of post-frontmatter content directly, so rules appear in the agent's working context without an extra `Read` call.
- 03The consumption checker fires via `mid_session_hook` every 25 tool calls, alerting only when ≥3 files are unconsumed AND the consumption ratio is below 0.3.
- 04Tested on a 130 MB / 32k-line session: 41 recall hits surfaced, 0 consumed.
- 05Drift detector alerts now embed body text from the most relevant past lesson session, replacing the previous bare cosine-score message.
- 06New `governance_plan` MCP tool uses `agents_capabilities.json` and `anchor_packs_phases.json` registries to dynamically route DAG phases to executors by capability score.
- 07Eval numbers are unchanged from the v1.0.0 locked figures (2026-05-08): LongMemEval-S 56.6%, EverMemBench-Dynamic 44.4% / 47.3%, drift detector ROC AUC 0.83, reproduction cost $3.50.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 7, 2026 · 12:45 UTC. How this works →