Multi-factor memory value model outperforms recency and similarity for LLM agents
Researchers Zhibao Chen and Qian Cheng propose a seven-factor, cognitively grounded memory value function V(m) that significantly outperforms recency and semantic similarity baselines for deciding what long-running LLM agents should remember, forget, or retrieve.
Score breakdown
The work shows that a learned, cognitively grounded multi-factor value function substantially outperforms the recency and semantic-similarity heuristics currently used in production agent memory systems, and exposes a methodological flaw in how LongMemEval is commonly evaluated.
- 01Proposes V(m) = Σᵢ wᵢ fᵢ(m), a multi-factor memory value function with seven factors drawn from cognitive psychology
- 02Seven factors: emotional intensity, goal relevance, value alignment, self/user relevance, task utility, reliability, and usage history
- 03Learned multi-factor model retains 0.770 ± 0.011 of gold evidence on LongMemEval (blind regime, 479 usable cases)
Zhibao Chen and Qian Cheng identify a fundamental mismatch in how production LLM agent systems handle memory: semantic similarity and recency are both mis-specified for the forgetting decision, which must be made at consolidation time before any future query is known. To address this, they propose V(m) = Σᵢ wᵢ fᵢ(m), a linear multi-factor value function over seven cognitively grounded factors — emotional intensity, goal relevance, value alignment, self/user relevance, task utility, reliability, and usage history. The weights are learned from a downstream objective using a gradient-free optimizer, and the resulting single scalar uniformly governs encoding depth, forget risk, and retrieval rank.
A neural network over the same factors ties the linear model, suggesting the linear formulation captures the relevant structure.
On LongMemEval in the realistic blind regime, the learned model achieves 0.770 ± 0.011 gold-evidence retention across 479 usable cases, versus 0.657 for uniform weights, 0.518 for the best single factor, and 0.368 for recency — with every paired gap's 95% bootstrap confidence interval above zero. A neural network over the same factors ties the linear model, suggesting the linear formulation captures the relevant structure. The learned weights are interpretable: reliability, emotional intensity, and self/user relevance dominate, while query-time goal similarity is correctly down-weighted for the forgetting decision.
The paper also makes a pointed methodological contribution: scoring goal relevance against the held-out evaluation question saturates gold-evidence retention at approximately 0.98 on LongMemEval, but this measures retrieval, not forgetting — an important distinction the authors argue prior work conflates. A controlled synthetic task with planted confounds further validates the approach: the learned weighting achieves 1.00 retention where uniform weighting fails at 0.62. The full substrate is open-source and all experiments run on a single CPU with no API calls.
Key facts
- 01Proposes V(m) = Σᵢ wᵢ fᵢ(m), a multi-factor memory value function with seven factors drawn from cognitive psychology
- 02Seven factors: emotional intensity, goal relevance, value alignment, self/user relevance, task utility, reliability, and usage history
- 03Learned multi-factor model retains 0.770 ± 0.011 of gold evidence on LongMemEval (blind regime, 479 usable cases)
- 04Baselines: uniform weights 0.657, best single factor 0.518, recency 0.368 — all gaps have 95% bootstrap CIs above zero
- 05A neural network over the same factors ties the linear model in performance
- 06Scoring goal relevance against the held-out evaluation question saturates retention at ~0.98, measuring retrieval not forgetting — flagged as a methodological pitfall
- 07All experiments run on a single CPU with no API calls; substrate is open-source
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 12, 2026 · 10:05 UTC. How this works →