TMEM framework lets LLM agents learn via in-episode LoRA updates
Researchers introduce TMEM, a parametric memory framework that updates fast LoRA weights during a single episode so agents genuinely alter their behavior from experience rather than merely retrieving past text.
Score breakdown
TMEM demonstrates that agent parameters can be updated within a single episode via online LoRA adaptation, overcoming the permanent information loss that affects all prompt-only memory approaches.
- 01Existing memory-augmented LLM agents store experience only as text in the prompt, keeping model parameters frozen throughout a rollout.
- 02TMEM absorbs distilled supervision into fast LoRA weights `Δ_t` via lightweight online updates within a single episode.
- 03Agent actions are sampled from `π_{θ_0+Δ_t}`, with extraction actions producing supervision that updates `Δ_t` for subsequent decisions.
Tao Ren, Weiyao Luo, and Hui Yang identify a fundamental limitation in current memory-augmented LLM agents: they store all past experience as textual summaries or retrieved passages in the prompt while keeping model parameters frozen. This means agents can look up prior information but cannot learn from it — any information dropped from the context window is permanently lost and the policy itself is never updated by experience.
To address this, the authors introduce TMEM, a self-evolving parametric memory framework.
To address this, the authors introduce TMEM, a self-evolving parametric memory framework. TMEM formalizes agent behavior as an agentic decision process with fast-weight rollout dynamics, where actions are sampled from `π_{θ_0+Δ_t}`. Extraction actions produce supervision that updates the fast LoRA weights `Δ_t` for subsequent decisions within the same episode, genuinely altering future behavior. Because the extraction policy is directly optimizable by reinforcement learning, training the base weights `θ_0` simultaneously improves task-level actions and the quality of supervision data used for online LoRA adaptation. The authors additionally propose SVD-based initialization of the LoRA subspace to accelerate online convergence.
Experiments across four benchmarks — LoCoMo, LongMemEval-S, multi-objective search, and CL-Bench — demonstrate that TMEM consistently outperforms both summary-based and retrieval-based memory baselines at multiple model scales.
Key facts
- 01Existing memory-augmented LLM agents store experience only as text in the prompt, keeping model parameters frozen throughout a rollout.
- 02TMEM absorbs distilled supervision into fast LoRA weights `Δ_t` via lightweight online updates within a single episode.
- 03Agent actions are sampled from `π_{θ_0+Δ_t}`, with extraction actions producing supervision that updates `Δ_t` for subsequent decisions.
- 04The extraction policy is directly optimizable by RL, so training `θ_0` improves both task actions and online LoRA adaptation data quality.
- 05SVD-based initialization of the LoRA subspace is proposed to accelerate online convergence.
- 06TMEM is evaluated on LoCoMo, LongMemEval-S, multi-objective search, and CL-Bench.
- 07TMEM consistently outperforms summary-based and retrieval-based baselines across different model scales.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →