HORMA uses hierarchical memory to cut agent token use to 22% of baseline
Researchers introduce HORMA, a hierarchical memory agent that organizes LLM agent experience into a file-system-like structure and uses a reinforcement-learning-trained navigator to retrieve only the minimal context needed, reducing token usage to at most 22.17% of baseline in long conversation tasks.
Score breakdown
HORMA reduces agent token consumption to at most 22.17% of baseline while maintaining or improving task performance, directly addressing the inference cost and latency penalties that make long-horizon LLM agents expensive to run.
- 01HORMA stands for Hierarchical Organize-and-Retrieve Memory Agent, proposed by Hao-Lun Hsu, Nikki Lijing Kuang, and Boyi Liu.
- 02It organizes agent experience into a file-system-like hierarchical structure linking summarized entities to raw trajectories.
- 03Working memory is decomposed into two stages: structured memory construction and navigation-based retrieval.
Hao-Lun Hsu, Nikki Lijing Kuang, and Boyi Liu present HORMA, a Hierarchical Organize-and-Retrieve Memory Agent, as a solution to a core limitation of LLM-based agents: their statelessness forces all task-relevant information into ever-growing input contexts, degrading reasoning quality, increasing inference cost, and raising latency. Existing remedies — lossy compression and similarity-based retrieval — fail to capture the temporal structure and causal dependencies that multi-step agentic tasks require.
HORMA addresses this by decomposing working memory into two stages.
HORMA addresses this by decomposing working memory into two stages. The construction module organizes experience into a file-system-like hierarchical structure, linking summarized entities to their corresponding raw trajectories. It iteratively refines this structure by distinguishing between failures caused by missing information and those caused by misleading or overloaded context. The retrieval module then uses a lightweight agent trained with reinforcement learning to traverse the hierarchy and select the minimal yet sufficient context for the current task, reducing latency along the critical execution path.
Across three benchmarks — ALFWorld, LoCoMo, and LongMemEval — HORMA improves task performance under constrained context budgets and requires at most 22.17% of the token usage of baseline methods in long conversation tasks. The system also generalizes effectively to unseen tasks and consistently achieves better efficiency-performance trade-offs than existing approaches.
Key facts
- 01HORMA stands for Hierarchical Organize-and-Retrieve Memory Agent, proposed by Hao-Lun Hsu, Nikki Lijing Kuang, and Boyi Liu.
- 02It organizes agent experience into a file-system-like hierarchical structure linking summarized entities to raw trajectories.
- 03Working memory is decomposed into two stages: structured memory construction and navigation-based retrieval.
- 04The construction module distinguishes failures caused by missing information from those caused by misleading or overloaded context.
- 05A lightweight navigation agent trained with reinforcement learning traverses the hierarchy to select minimal yet sufficient context.
- 06Evaluated on ALFWorld, LoCoMo, and LongMemEval benchmarks.
- 07HORMA requires at most 22.17% of baseline token usage in long conversation tasks.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 11, 2026 · 08:34 UTC. How this works →