★ Rank 18 today·Jun 11, 2026·1 min readResearch Papers

Multi-factor memory value model outperforms recency and similarity for LLM agents

Researchers Zhibao Chen and Qian Cheng propose a seven-factor, cognitively grounded memory value function V(m) that significantly outperforms recency and semantic similarity baselines for deciding what long-running LLM agents should remember, forget, or retrieve.

ArXiv·Zhibao Chen, Qian Cheng

Read at source

Composite · rank 18

6.6

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

The work shows that a learned, cognitively grounded multi-factor value function substantially outperforms the recency and semantic-similarity heuristics currently used in production agent memory systems, and exposes a methodological flaw in how LongMemEval is commonly evaluated.

01Proposes V(m) = Σᵢ wᵢ fᵢ(m), a multi-factor memory value function with seven factors drawn from cognitive psychology
02Seven factors: emotional intensity, goal relevance, value alignment, self/user relevance, task utility, reliability, and usage history
03Learned multi-factor model retains 0.770 ± 0.011 of gold evidence on LongMemEval (blind regime, 479 usable cases)

Summary— our read of the original

Zhibao Chen and Qian Cheng identify a fundamental mismatch in how production LLM agent systems handle memory: semantic similarity and recency are both mis-specified for the forgetting decision, which must be made at consolidation time before any future query is known. To address this, they propose V(m) = Σᵢ wᵢ fᵢ(m), a linear multi-factor value function over seven cognitively grounded factors — emotional intensity, goal relevance, value alignment, self/user relevance, task utility, reliability, and usage history. The weights are learned from a downstream objective using a gradient-free optimizer, and the resulting single scalar uniformly governs encoding depth, forget risk, and retrieval rank.

A neural network over the same factors ties the linear model, suggesting the linear formulation captures the relevant structure.

On LongMemEval in the realistic blind regime, the learned model achieves 0.770 ± 0.011 gold-evidence retention across 479 usable cases, versus 0.657 for uniform weights, 0.518 for the best single factor, and 0.368 for recency — with every paired gap's 95% bootstrap confidence interval above zero. A neural network over the same factors ties the linear model, suggesting the linear formulation captures the relevant structure. The learned weights are interpretable: reliability, emotional intensity, and self/user relevance dominate, while query-time goal similarity is correctly down-weighted for the forgetting decision.

The paper also makes a pointed methodological contribution: scoring goal relevance against the held-out evaluation question saturates gold-evidence retention at approximately 0.98 on LongMemEval, but this measures retrieval, not forgetting — an important distinction the authors argue prior work conflates. A controlled synthetic task with planted confounds further validates the approach: the learned weighting achieves 1.00 retention where uniform weighting fails at 0.62. The full substrate is open-source and all experiments run on a single CPU with no API calls.

Key facts

01Proposes V(m) = Σᵢ wᵢ fᵢ(m), a multi-factor memory value function with seven factors drawn from cognitive psychology
02Seven factors: emotional intensity, goal relevance, value alignment, self/user relevance, task utility, reliability, and usage history
03Learned multi-factor model retains 0.770 ± 0.011 of gold evidence on LongMemEval (blind regime, 479 usable cases)
04Baselines: uniform weights 0.657, best single factor 0.518, recency 0.368 — all gaps have 95% bootstrap CIs above zero
05A neural network over the same factors ties the linear model in performance
06Scoring goal relevance against the held-out evaluation question saturates retention at ~0.98, measuring retrieval not forgetting — flagged as a methodological pitfall
07All experiments run on a single CPU with no API calls; substrate is open-source

Topics

#agent-memory #benchmarks #reasoning #multi-agent

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 12, 2026 · 10:05 UTC. How this works →

Multi-factor memory value model outperforms recency and similarity for LLM agents

Score breakdown

Key facts

Topics

More in Research Papers.

Score breakdown

Key facts

Topics

More in Research Papers.