Agent Memory
Agent memory is how an AI agent retains information beyond its context window — working state during a task and persistent knowledge across sessions.
Agent memory is the machinery that lets an agent know things its context window no longer holds — working state within a long task, and persistent knowledge across sessions.
The split mirrors the constraint. Short-term memory is the context window: ephemeral, complete, and finite — managed by compaction and careful loading. Long-term memory is storage outside the model: notes the agent writes, facts it accumulates, preferences it learns — persisted to files or databases and retrieved when relevant, which makes long-term memory largely a retrieval problem wearing a different hat.
Production patterns range from file-based (Claude Code's CLAUDE.md and auto-memory — transparent, versionable, user-editable) to dedicated memory layers (Mem0, Zep) that extract, store, and retrieve facts automatically. The design questions that matter — what's worth remembering, when to recall it, how to forget what's stale — are the substance of Agent Memory Architecture. The failure modes are instructive too: remember too little and the agent re-learns your codebase every session; remember too much and stale facts poison fresh work.
Frequently asked questions
- What's the difference between short-term and long-term agent memory?
- Short-term memory is the context window — everything the agent currently sees, gone when the session ends. Long-term memory is anything persisted outside it — files, databases, vector stores — that survives across sessions and gets selectively loaded back. The engineering is in the 'selectively': what to write down, and what to recall when.
- How do real systems implement agent memory?
- Mostly as retrieval over stored notes: the agent writes durable facts to storage (markdown files, a vector database via layers like Mem0 or Zep, plain records), then retrieves what's relevant at session start or mid-task. Claude Code's CLAUDE.md and auto-memory are the file-based version of the same pattern.
Related
- Agent Memory Architecture: Short-Term, Long-Term, and When to Use EachHow AI agents remember — working memory vs. persistent long-term memory, what to store, how to retrieve it, and how to keep context small.
- Managing Claude Code Memory & Context: CLAUDE.md, /compact, and Auto-MemoryHow Claude Code remembers — every CLAUDE.md scope and load order, path-scoped rules, the auto-memory system, and the context commands that keep sessions sharp.
- Context WindowThe context window is the maximum text — measured in tokens — an LLM can consider at once: prompt, conversation, documents, and its own output combined.
- Mem0A memory layer for AI agents and apps — persistent, personalized long-term memory across sessions.
- RAG (Retrieval-Augmented Generation)RAG retrieves relevant documents from your own data and injects them into an LLM's prompt at query time, grounding answers in facts the model wasn't trained on.
- Mem0 vs Zep vs Letta: Agent Memory Compared (2026)Three philosophies of agent memory — Mem0's drop-in layer, Zep's temporal knowledge graphs, Letta's self-managing agents — and which fits your architecture.
- LettaStateful agents from the MemGPT creators — an Apache-2.0 server with self-editing memory, and Letta Code, the memory-first model-agnostic coding harness.
- ZepAgent memory on temporal knowledge graphs — Zep Cloud for sub-200ms context retrieval at enterprise scale, with Graphiti as its open-source graph engine.
- AI AgentAn AI agent is an LLM-driven system that pursues a goal in a loop — calling tools, observing results, iterating — instead of returning one answer.