Search for a command to run...
Every processed story in chronological order, with the newest coverage first. Filter by tag, source, or score to drill in.
The project demonstrates a self-running, bidirectional loop between a browser-based AI chat and a local coding agent, removing the manual handoff that normally separates planning in Claude.ai from execution in Claude Code.
IntentProbe addresses a gap the post identifies in existing MCP security tooling: the inability of text-based classifiers to distinguish safe from poisoned tool descriptions when both use nearly identical vocabulary, a scenario where the post reports the strongest reproducible DeBERTa baseline scored 0% recall.
The experiment provides concrete token-count measurements showing that schema design and output pruning — not model choice — are the dominant levers for reducing MCP call costs, with output pruning alone responsible for 35–40% of total token overhead.