译:「理解 Agents 记忆」
🗒️

译:「理解 Agents 记忆」

 
notion image
Making Sense of Memory in AI Agents
读懂 AI 智能体的记忆机制
I've been catching up on the topic of memory management for AI agents recently and was overwhelmed by the amount of new terminology and concepts. This blog post serves as my working study notes to collect all the information on memory for agents. Therefore, please note that this information is subject to change as I'm learning more about the topic.
最近,我一直在补习 AI 智能体(AI Agents)记忆管理的相关知识,却被涌现的大量新术语和概念搞得应接不暇。这篇博文是我的学习笔记,旨在汇总关于智能体记忆的所有信息。因此,请注意,随着我对该主题了解的深入,这些信息可能会有所更新。

What is agent memory?

什么是智能体记忆?

Memory in AI agents is the ability to remember and recall important information across multiple user interactions. This ability enables agents to learn from feedback and adapt to user preferences, thereby enhancing the system's performance and improving the user experience.
AI 智能体中的记忆是指在多次用户交互中记住并回溯重要信息的能力。这种能力使智能体能够从反馈中学习并适应用户的偏好,从而提升系统性能并改善用户体验。
Large Language Models (LLMs) that power AI agents are stateless and don't have memory as a built-in feature. LLMs learn and remember information during their training phase and store it in their model weights (parametric knowledge), but they don't immediately learn and remember what you just said. Therefore, every time you interact with an LLM, each time is essentially a fresh start. The LLM has no memory of previous inputs.
驱动 AI 智能体的大语言模型(LLM)本质上是无状态的,并不具备内置的记忆功能。LLM 在训练阶段学习并记住信息,将其存储在模型权重中(参数化知识),但它们无法立即学习并记住你刚刚说的话。因此,每次你与 LLM 交互时,本质上都是一次全新的开始。LLM 对之前的输入没有任何记忆。
Therefore, to enable an LLM agent to recall what was said earlier in this conversation or in a previous session, developers must provide it with access to past interactions from the current and past conversations.
因此,为了使 LLM 智能体能够回溯本次对话早期或之前会话中所说的内容,开发者必须赋予其访问当前及过往交互记录的权限。
Sidenotes: The term “memory”
Difference between agent memory and agentic memory

Types of agent memory

智能体记忆的类型

Agent memory can be categorized into different types depending on the type of information stored and its location. There are various ways to categorize agent memory into distinct types. But at the highest level, they all differentiate between short-term (in-context) and long-term memory (out-of-context):
根据存储信息的类型及其位置,智能体记忆可以分为不同的类别。虽然分类方式多种多样,但在最高层面上,它们都区分了短期记忆(上下文内)和长期记忆(上下文外):
  • In-context memory (Short-term memory) refers to the information available in the context window of the LLM. This can be both information from the current conversation as well as information pulled in from past conversations.
  • *上下文内记忆(短期记忆)**指的是 LLM 上下文窗口(Context Window)中可用的信息。这既可以是来自当前对话的信息,也可以是从过去的对话中拉取的信息。
  • Out-of-context memory (Long-term memory) refers to the information stored in external storage, such as a (vector or graph) database.
  • *上下文外记忆(长期记忆)**指的是存储在外部存储(如向量数据库或图数据库)中的信息。
“An agent can have two types of memory: In-context (short-term) memory and out-of-context (long-term) memory”
The most commonly seen topology is based on the Cognitive Architectures for Language Agents (CoALA) paper, which distinguishes between four memory types that mimic human memory, drawing on the SOAR architecture from the 1980s. Below, you can find a table of the different memory types inspired by a similar one in the LangGraph documentation:
最常见的拓扑结构基于《语言智能体的认知架构》(CoALA)论文,该论文借鉴了 20 世纪 80 年代的 SOAR 架构,区分了四种模仿人类记忆的记忆类型。受 LangGraph 文档中类似表格的启发,我在下方列出了这些不同的记忆类型:
Memory Type
What is stored
Human Example
Agent Example
Working memory 工作记忆
Contents of the context window 上下文窗口的内容
Current conversation (e.g., “Hi, my name is Sam.”) 当前对话(如:“嗨,我叫山姆。”)
Current conversation (e.g., “Hi, my name is Sam.”) 当前对话(如:“嗨,我叫山姆。”)
Semantic memory 语义记忆
Facts 事实
Things I learned in school (e.g., “Water freezes at 0°C”) 我在学校学到的东西(如:“水在 0°C 结冰”)
Facts about a user (e.g., “Dog's name is Henry”) 关于用户的事实(如:“狗狗的名字叫亨利”)
Episodic memory 情景记忆
Experiences 经历
Things I did (e.g., “Went to Six Flags on 10th birthday”) 我做过的事情(如:“10 岁生日去了六旗游乐园”)
Past actions (e.g., “Failed to calculate 1+1 without using a calculator”) 过去的操作(如:“不使用计算器未能计算出 1+1”)
Procedural memory 程序记忆
Instructions 指令
Instincts or motor skills (e.g., “How to ride a bike”) 本能或运动技能(如:“如何骑自行车”)
Instructions in the system prompt (e.g., “Always ask follow-up questions before answering a question.”) 系统提示词中的指令(如:“回答问题前务必先追问”)
However, there's also another approach to categorizing memory types for AI agents from a design pattern perspective. Sarah Wooders from Letta argues that an LLM is a tokens-in-tokens-out function, not a brain, and that, therefore, the overly anthropomorphized analogies are not fit. If you look at how Letta defines the types of agent memory, you will see that they define it differently:
然而,从设计模式的角度来看,还有另一种对 AI 智能体记忆类型进行分类的方法。来自 Letta 的 Sarah Wooders 认为,LLM 是一个“Token 进,Token 出”的函数,而不是大脑,因此过度拟人化的类比并不适用。如果你看看 Letta 如何定义智能体记忆的类型,你会发现它们的定义有所不同:
  • Message Buffer (Recent messages) stores the most recent messages from the current conversation.
  • 消息缓冲区(Message Buffer,即近期消息):存储当前对话中最近的消息。
  • Core Memory (In-Context Memory Blocks) is specific information that the agent itself manages (e.g., the user's birthday or the boyfriend's name if this is relevant to the current conversation).
  • 核心记忆(Core Memory,即上下文内记忆块):智能体自行管理的特定信息(例如用户的生日或男朋友的名字,如果这些与当前对话相关)。
  • Recall Memory (Conversational History) is the raw conversation history.
  • 回溯记忆(Recall Memory,即对话历史):原始的对话历史记录。
  • Archival Memory (Explicitly Stored Knowledge) is explicitly formulated information stored in an external database.
  • 档案记忆(Archival Memory,即显式存储的知识):存储在外部数据库中的显式结构化信息。
The difference lies in how they design in-context and out-of-context memory. For example, CoALA's working memory is one category, while Letta splits this into message buffer and core memory. The long-term memory from the CoALA paper can be thought of as the out-of-context memory in Letta. However, the long-term memory types of procedural, episodic, and semantic aren't directly mappable to Letta's recall and archival memory. You can think of CoALA's semantic memory as Letta's archival memory, but the other are different from each other. Notably, the CoALA taxonomy doesn't include the raw conversation history in long-term memory.
两者的区别在于它们如何设计上下文内记忆和上下文外记忆。例如,CoALA 的工作记忆是一个单一类别,而 Letta 将其拆分为消息缓冲区和核心记忆。CoALA 论文中的长期记忆可以被视为 Letta 中的上下文外记忆。然而,程序记忆、情景记忆和语义记忆这些长期记忆类型,并不能直接映射到 Letta 的回溯记忆和档案记忆上。你可以将 CoALA 的语义记忆看作 Letta 的档案记忆,但其他的则互不相同。值得注意的是,CoALA 的分类法并没有将原始对话历史包含在长期记忆中。

AI Agent Memory Management

AI 智能体记忆管理

Memory management in AI agents refers to how to manage information within the LLM's context window and in external storage, as well as how to transfer information between them. Richmond Alake lists the following core components of agent memory management: generation, storage, retrieval, integration, updating, and deletion (forgetting).
AI 智能体中的记忆管理指的是如何在 LLM 的上下文窗口和外部存储中管理信息,以及如何在两者之间传输信息。Richmond Alake 列出了智能体记忆管理的核心组件:生成、存储、检索、整合、更新和删除(遗忘)。

Managing memory in the context window

管理上下文窗口中的记忆

The goal of managing memory in the context window is to ensure that only relevant information is retained, thereby avoiding confusion for the LLM with incorrect, irrelevant, or contradictory information. Additionally, as the conversation progresses, the conversation history grows (involving more tokens) and leads to slower responses and higher costs, potentially reaching the context window's limit.
管理上下文窗口中记忆的目标是确保仅保留相关信息,从而避免因错误、无关或相互矛盾的信息而使 LLM 产生混淆。此外,随着对话的进行,对话历史不断增长(涉及更多 Token),会导致响应变慢和成本增加,并可能达到上下文窗口的限制。
To mitigate this problem, you can maintain the conversation history in different ways. For example, you can manually remove old and obsolete information from the context window. Alternatively, you can periodically summarize the previous conversation and retain only the summary, then delete the old messages.
为了缓解这个问题,你可以通过不同的方式维护对话历史。例如,你可以手动从上下文窗口中移除陈旧和过时的信息。或者,你可以定期总结之前的对话,只保留摘要,然后删除旧消息。

Managing memory in external storage

管理外部存储中的记忆

The main goal of memory management in external storage is to prevent memory bloat and to ensure the quality and relevance of the stored information. The four core operations for managing memory in external storage include:
管理外部存储中记忆的主要目标是防止记忆膨胀(Memory Bloat),并确保存储信息的质量和相关性。管理外部存储记忆的四个核心操作包括:
  • ADD: Adding new information to the external storage.
  • ADD(添加): 将新信息添加到外部存储中。
  • UPDATE: Identifying existing information, modifying it to reflect new information, or correcting outdated information (e.g., updating a user's new address).
  • UPDATE(更新): 识别现有信息,对其进行修改以反映新信息,或更正过时信息(例如,更新用户的新地址)。
  • DELETE: Forgetting obsolete information to prevent memory bloat and degradation of the information quality.
  • DELETE(删除): 遗忘过时信息,以防止记忆膨胀和信息质量下降。
  • NOOP: This is the decision point where the memory management system determines that the current interaction contains no new, relevant, or contradictory information that warrants a database transaction.
  • NOOP(无操作): 这是一个决策点,记忆管理系统判定当前的交互不包含任何值得进行数据库事务的新信息、相关信息或矛盾信息。

Transferring information between the context window and external storage

在上下文窗口和外部存储之间传输信息

One important question developers need to answer is when to manage memory, specifically when to transfer information from the context window to external storage. The LangChain blog post on “memory for agents” differentiates between the hot path and background, while I've also seen the two referred to as explicit and implicit memory updates in Philipp Schmid's blog on “Memory in Agents”.
开发者需要回答的一个重要问题是何时管理记忆,具体来说,就是何时将信息从上下文窗口传输到外部存储。LangChain 关于“智能体记忆”的博文区分了热路径(hot path)和后台(background),而在 Philipp Schmid 关于“智能体记忆”的博客中,我也看到这两个概念被称为显式和隐式记忆更新。
Explicit memory (hot path) describes the agent memory system's ability to autonomously recognize important information and decide to explicitly remember it (via tool calling). Explicit memory in humans is the conscious storage of information (e.g., episodic and semantic memory). While ideally, remembering important information in the hot path is how humans remember information, it can be challenging to implement a robust solution that understands which information is important to remember.
  • *显式记忆(热路径)**描述了智能体记忆系统自主识别重要信息并决定显式记住它的能力(通过工具调用)。人类的显式记忆是对信息的有意识存储(如情景记忆和语义记忆)。虽然理想情况下,在热路径中记住重要信息是人类记忆信息的方式,但要实现一个能理解哪些信息值得记忆的稳健解决方案却充满挑战。
Implicit memory (background) describes when memory management is programmatically defined in the system at specific times during or after a conversation. Implicit memory in humans is the unconscious storage of information (e.g., procedural memory). The Google whitepaper on session and memory describes the following three scenarios:
  • *隐式记忆(后台)**描述的是记忆管理在对话期间或对话后的特定时间点在系统中以编程方式定义的情况。人类的隐式记忆是对信息的无意识存储(如程序记忆)。Google 关于会话和记忆的白皮书描述了以下三种场景:
  1. After a session: You can batch process the entire conversation after a session.
    1. 会话结束后: 你可以在会话结束后批量处理整个对话。
  1. In periodic intervals: If your use case has long-running conversations, you can define an interval at which session data is transferred to long-term memory.
    1. 按周期性间隔: 如果你的用例涉及长时间运行的对话,你可以定义一个间隔,定期将会话数据传输到长期记忆中。
  1. After every turn: If your use case has requirements for real-time updates. However, keep in mind that the raw conversation history is typically appended and stored in the context window for a short period (“short-term memory”).
    1. 每轮对话后: 如果你的用例有实时更新的需求。不过请记住,原始对话历史通常会被追加并短期存储在上下文窗口中(“短期记忆”)。

Implementing agent memory

实现智能体记忆

When implementing agent memory, consider where to store the memory information:
在实现智能体记忆时,请考虑将记忆信息存储在哪里:
  • Current conversation history is usually implemented as a simple list of past user queries, assistant messages, and maybe tool calls or reasoning.
    • 当前对话历史通常被实现为一个简单的列表,包含过去的用户查询、助手消息,可能还有工具调用或推理过程。
  • Instructions are typically found in text or Markdown files, similar to well-known examples such as CLAUDE.md files.
    • 指令通常存在于文本或 Markdown 文件中,类似于众所周知的 CLAUDE.md 文件。
  • Other information is usually stored in a database, depending on what type of retrieval method is suitable for your data.
    • 其他信息通常存储在数据库中,具体取决于适合你数据的检索方法。
If you're interested in actual code implementation, I recommend you check out how my colleague JP Hwang implemented an agentic memory layer for a conversational AI using Weaviate or how Adam Łucek implements the four different types of memory from the CoALA paper (working memory, episodic memory, semantic memory, and procedural memory).
如果你对实际的代码实现感兴趣,我建议你看看我的同事 JP Hwang 如何使用 Weaviate 为对话式 AI 实现代理式记忆层,或者 Adam Łucek 如何实现 CoALA 论文中的四种不同类型的记忆(工作记忆、情景记忆、语义记忆和程序记忆)。

Challenges of agent memory design

智能体记忆设计的挑战

Implementing memory for agents currently is a challenging task. The difficulty lies in optimizing the system to avoid slower response times while simultaneously solving the complex problem of determining what information is obsolete and should be permanently deleted:
目前,为智能体实现记忆是一项具有挑战性的任务。困难在于既要优化系统以避免响应时间变慢,又要解决确定哪些信息已过时并应永久删除这一复杂问题:
  • Latency: Constantly processing whether the agent now needs to retrieve new information from or offload data to the memory bank can lead to slower response times.
    • 延迟: 不断处理智能体是否需要从记忆库中检索新信息或将数据卸载到记忆库中,可能会导致响应时间变慢。
  • Forgetting: This seems to be the hardest challenge for developers at the moment. How do you automate a mechanism that decides when and what information to permanently delete? Managing the information stored in external memory is important to avoid memory bloat and the degradation of the information quality.
    • 遗忘: 这似乎是目前开发者面临的最艰巨挑战。如何自动化一个机制来决定何时以及永久删除哪些信息?管理存储在外部记忆中的信息对于避免记忆膨胀和信息质量下降至关重要。

Frameworks for AI Agent Memory

AI 智能体记忆框架

The ecosystem of developer tools for implementing memory solutions for agents is rapidly growing and attracting investors' attention. There are frameworks dedicated to solving the agent memory problem, such as:
用于实现智能体记忆解决方案的开发者工具生态系统正在迅速增长,并吸引了投资者的关注。有一些致力于解决智能体记忆问题的框架,例如:
  • mem0 (see my example implementation with Weaviate), Letta based on the MemGPT design pattern (see my example implementation), Cognee, and zep.
    • mem0(参见我使用 Weaviate 的示例实现)、基于 MemGPT 设计模式的 Letta(参见我的示例实现)、Cognee 和 zep。
However, many agent orchestration frameworks, such as:
然而,许多智能体编排框架,例如:
  • LangChain and LangGraph, LlamaIndex (see my example implementation), CrewAI, and Google's Agent Development Kit (ADK) (see my example implementation),
    • LangChain 和 LangGraph、LlamaIndex(参见我的示例实现)、CrewAI 以及 Google 的 Agent Development Kit (ADK)(参见我的示例实现),
also offer solutions for AI agent memory management. Additionally, some model provider's such as Anthropic, provide built-in memory tools (see my example implementation).
也提供了 AI 智能体记忆管理的解决方案。此外,一些模型提供商(如 Anthropic)也提供了内置的记忆工具(参见我的示例实现)。

Summary

总结

As I'm diving into the topic of memory management for AI agents, I'm learning the importance of not only remembering information from the current conversation but also past conversations, and also how to modify and forget outdated and obsolete information effectively.
随着我对 AI 智能体记忆管理这一主题的深入研究,我逐渐认识到,不仅记住当前对话的信息很重要,记住过去对话的信息,以及如何有效地修改和遗忘陈旧过时的信息也同样重要。
As the field evolves, different approaches and terminology are emerging with it. On the one hand, you have approaches that categorize memory types into semantic, episodic, or procedural memory, analogous to the human memory. On the other hand, you have approaches, such as Letta, which use architecture-focused terminology.
随着该领域的发展,不同的方法和术语也随之涌现。一方面,有些方法将记忆类型分类为语义记忆、情景记忆或程序记忆,类似于人类记忆。另一方面,也有像 Letta 这样的方法,使用侧重于架构的术语。
Nevertheless, the core challenge of memory design for AI agents is how to flow information between an LLM's context window (short-term memory) and external storage (long-term memory). This involves deciding when to manage updates (hot path vs. background) while overcoming key challenges such as latency and the difficulty of forgetting obsolete data.
尽管如此,AI 智能体记忆设计的核心挑战在于如何让信息在 LLM 的上下文窗口(短期记忆)和外部存储(长期记忆)之间流动。这涉及到决定何时管理更新(热路径与后台),同时克服诸如延迟和遗忘过时数据困难等关键挑战。