Jun 7, 2026·1 min readNew Models & Releases

Compass v1.1.0 ships recall-consumption drift detection

`nautilus-compass` v1.1.0 adds a `recall_consumption.py` module that detects when an agent surfaces memory files but never actually reads their contents, closing the gap between a recall hit and genuine rule consumption.

Dev.to #llm·chunxiaoxx

Read at source

Composite

4.7

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Agentic coding pipelines that rely on memory retrieval need to verify actual content consumption, not just recall hits — this release provides a concrete, low-overhead mechanism to catch that gap before it causes silent rule violations.

01v1.1.0 adds `recall_consumption.py`, which audits live session `Read` tool calls to detect recalled files that were never actually opened by the agent.
02Top-3 recall hits now embed the first 800 characters of post-frontmatter content directly, so rules appear in the agent's working context without an extra `Read` call.
03The consumption checker fires via `mid_session_hook` every 25 tool calls, alerting only when ≥3 files are unconsumed AND the consumption ratio is below 0.3.

Summary— our read of the original

The post by chunxiaoxx describes a concrete failure that motivated `nautilus-compass` v1.1.0: a Claude Code agent recalled the correct memory file (`publisher_quality_pipeline_20260430.md`, scored 0.84), saw its title and 80-character description, then proceeded to skip the actual rules inside — running `_tmp_publish_v8.cjs` scripts with no critic round and the wrong configuration. The diagnosis was that recall surfaced the right file but the agent never consumed it, a failure the post frames as structural rather than purely the agent's fault: the shape of the recall response made it easy to act on the stub instead of the full content.

To fix this, v1.1.0 embeds the first 800 characters of post-frontmatter content directly in the top-3 recall hits, so the rules land in the agent's working context without requiring an additional `Read` call.

To fix this, v1.1.0 embeds the first 800 characters of post-frontmatter content directly in the top-3 recall hits, so the rules land in the agent's working context without requiring an additional `Read` call. The new `recall_consumption.py` module walks back through the live session's `Read` tool calls to determine whether recalled files were actually opened. It is wired into the `drift_check` MCP tool (runs even when the BGE daemon is unreachable, since the audit is pure file traversal) and a `mid_session_hook` that fires every 25 tool calls, alerting only when 3 or more files are unconsumed and the consumption ratio is below 0.3. The module was tested on a 130 MB / 32k-line session where 41 recall hits were surfaced and 0 were consumed.

The drift detector's alerts are also improved: previously they reported only "matched anti-anchor X with cos=0.625"; v1.1.0 alerts now embed body text from the most relevant past lesson session. On the governance side, the new `governance_plan` MCP tool reads two file-exported registries — `agents_capabilities.json` and `anchor_packs_phases.json` — to dynamically route phases to executors by capability score, replacing the static channel-dict fan-out from v1.0.0. Eval numbers remain at the v1.0.0 locked figures from 2026-05-08: LongMemEval-S 56.6% (n=500), EverMemBench-Dynamic runs of 44.4% and 47.3%, drift detector ROC AUC 0.83, and end-to-end reproduction cost of $3.50.

Key facts

01v1.1.0 adds `recall_consumption.py`, which audits live session `Read` tool calls to detect recalled files that were never actually opened by the agent.
02Top-3 recall hits now embed the first 800 characters of post-frontmatter content directly, so rules appear in the agent's working context without an extra `Read` call.
03The consumption checker fires via `mid_session_hook` every 25 tool calls, alerting only when ≥3 files are unconsumed AND the consumption ratio is below 0.3.
04Tested on a 130 MB / 32k-line session: 41 recall hits surfaced, 0 consumed.
05Drift detector alerts now embed body text from the most relevant past lesson session, replacing the previous bare cosine-score message.
06New `governance_plan` MCP tool uses `agents_capabilities.json` and `anchor_packs_phases.json` registries to dynamically route DAG phases to executors by capability score.
07Eval numbers are unchanged from the v1.0.0 locked figures (2026-05-08): LongMemEval-S 56.6%, EverMemBench-Dynamic 44.4% / 47.3%, drift detector ROC AUC 0.83, reproduction cost $3.50.

Topics

#agent-framework #memory-management #mcp #tool-use

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 7, 2026 · 12:45 UTC. How this works →

Jun 7, 2026·1 min readNew Models & Releases

Compass v1.1.0 ships recall-consumption drift detection

Dev.to #llm·chunxiaoxx

Read at source

Composite

4.7

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01v1.1.0 adds `recall_consumption.py`, which audits live session `Read` tool calls to detect recalled files that were never actually opened by the agent.
02Top-3 recall hits now embed the first 800 characters of post-frontmatter content directly, so rules appear in the agent's working context without an extra `Read` call.
03The consumption checker fires via `mid_session_hook` every 25 tool calls, alerting only when ≥3 files are unconsumed AND the consumption ratio is below 0.3.

Summary— our read of the original

To fix this, v1.1.0 embeds the first 800 characters of post-frontmatter content directly in the top-3 recall hits, so the rules land in the agent's working context without requiring an additional `Read` call.

Key facts

01v1.1.0 adds `recall_consumption.py`, which audits live session `Read` tool calls to detect recalled files that were never actually opened by the agent.
02Top-3 recall hits now embed the first 800 characters of post-frontmatter content directly, so rules appear in the agent's working context without an extra `Read` call.
03The consumption checker fires via `mid_session_hook` every 25 tool calls, alerting only when ≥3 files are unconsumed AND the consumption ratio is below 0.3.
04Tested on a 130 MB / 32k-line session: 41 recall hits surfaced, 0 consumed.
05Drift detector alerts now embed body text from the most relevant past lesson session, replacing the previous bare cosine-score message.
06New `governance_plan` MCP tool uses `agents_capabilities.json` and `anchor_packs_phases.json` registries to dynamically route DAG phases to executors by capability score.
07Eval numbers are unchanged from the v1.0.0 locked figures (2026-05-08): LongMemEval-S 56.6%, EverMemBench-Dynamic 44.4% / 47.3%, drift detector ROC AUC 0.83, reproduction cost $3.50.

Topics

#agent-framework #memory-management #mcp #tool-use

Methodology

Score breakdown

Key facts

Topics

More in New Models & Releases.

Score breakdown

Key facts

Topics

More in New Models & Releases.