TRACE compiles user corrections into runtime checks for coding agents
Researchers introduce TRACE, a pipeline that mines user corrections from chat, rewrites them as atomic rules, and compiles them into runtime enforcement checks that coding agents must pass before completing future tasks.
Score breakdown
TRACE directly addresses the repeated-friction failure mode where users must restate the same correction across sessions — a gap that memory-based approaches alone demonstrably fail to close.
- 01Mem0 memory leaves 57.5% of applicable preference checks violated in tasks derived from anonymized real-user friction cases.
- 02TRACE stands for Test-time Rule Acquisition and Compiled Enforcement.
- 03TRACE mines user chat corrections, rewrites them as atomic rules, and compiles them into runtime checks agents must pass before completing tasks.
Yujun Zhou, Kehan Guo, and Haomin Zhuang present TRACE (Test-time Rule Acquisition and Compiled Enforcement), a system designed to close the gap between an agent having access to user preferences and actually complying with them. The core problem the paper identifies is that corrections made in one session are frequently violated in subsequent sessions — a failure mode that memory-based approaches do not reliably solve. Using tasks derived from anonymized real-user friction cases, the authors show that Mem0 memory still leaves 57.5% of applicable preference checks violated, motivating a different approach.
Crucially, unlike developer-written runtime checks, TRACE rules originate from the user's own corrections.
TRACE operates as a drop-in skill-layer pipeline: it mines corrections from user chat, rewrites them as atomic rules, and compiles those rules into runtime checks that must pass before an agent completes a future task. Crucially, unlike developer-written runtime checks, TRACE rules originate from the user's own corrections. The system is evaluated using simulated user-in-the-loop experiments on two benchmarks: ClawArena coding-agent tasks and MemoryArena-derived memory-intensive tasks. On ClawArena, TRACE reduces preference violation from 100.0% to 37.6% on in-distribution tasks and from 100.0% to 2.0% on out-of-distribution tasks. On MemoryArena-derived tasks, in-distribution violation drops from 100.0% to 60.5%, while task pass rate matches or exceeds the strongest memory baseline.
The authors frame TRACE as addressing a "repeated-friction failure mode" — the need for users to restate the same correction across multiple sessions. Experiment code is available at `https://github.com/YujunZhou/TRACE_exp`, and the deployable skill is available at `https://github.com/YujunZhou/tellonce`.
Key facts
- 01Mem0 memory leaves 57.5% of applicable preference checks violated in tasks derived from anonymized real-user friction cases.
- 02TRACE stands for Test-time Rule Acquisition and Compiled Enforcement.
- 03TRACE mines user chat corrections, rewrites them as atomic rules, and compiles them into runtime checks agents must pass before completing tasks.
- 04On ClawArena, TRACE reduces preference violation from 100.0% to 37.6% (in-distribution) and from 100.0% to 2.0% (out-of-distribution).
- 05On MemoryArena-derived tasks, TRACE reduces in-distribution violation from 100.0% to 60.5% while matching or exceeding the strongest memory baseline on task pass.
- 06Unlike developer-written runtime checks, TRACE rules originate from the user's own corrections.
- 07Experiment code and a deployable skill are available on GitHub at YujunZhou/TRACE_exp and YujunZhou/tellonce.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 12, 2026 · 10:05 UTC. How this works →