# Agent Memory Architecture: Short-Term, Long-Term, and When to Use Each

> How AI agents remember — working memory vs. persistent long-term memory, what to store, how to retrieve it, and how to keep context small.

Agent memory comes in two layers: short-term working memory (the context window for the current task) and long-term memory (facts persisted across sessions and retrieved when relevant). The skill is deciding what's worth remembering, storing it as distilled facts rather than raw transcripts, and retrieving only what the current turn needs — so the agent feels continuous without bloating context.

An agent without memory is a stranger every conversation — it forgets your name, your preferences, and what it did five minutes ago. Memory is what makes an agent feel continuous and competent. But "give it memory" doesn't mean "stuff everything into the prompt." Good agent memory is an architecture: two layers, each with a job.

## Two layers

### Short-term (working) memory

This is the **context window** — what the model can see for the current task: the recent turns of the conversation, the current goal, and any long-term facts you've retrieved for this turn. It's fast and immediate, but bounded and ephemeral: when the session ends (or the window fills), it's gone. Managing it well — keeping it tight — is most of the [context engineering](/guides/prompting/context-engineering) battle.

### Long-term memory

This is knowledge **persisted outside the model** — in a database or vector store — that survives across sessions. It's how an agent recalls, next week, that you prefer TypeScript and you're on the Enterprise plan. Long-term memory comes in flavors worth distinguishing:

- **Semantic** — facts ("the user's company is Acme").
- **Episodic** — past events and interactions ("last time, we tried approach X and it failed").
- **Procedural** — how to do things (learned instructions, successful tool sequences).

## The core move: store less, remember more

The naive approach — append the entire conversation history to the prompt every turn — fails on cost, latency, context limits, and the "lost in the middle" effect where the model overlooks details buried in a huge context. The better pattern is to **extract and distill**: after a turn, save the salient *facts*, not the raw transcript; then at the next turn, **retrieve only the memories relevant to the current query** and inject those. The agent remembers more by keeping context small.

A library like [Mem0](/tools/mem0) implements exactly this extract-store-retrieve loop; frameworks like [LangGraph](/tools/langgraph) provide persistence/checkpointing for the working-memory side.

> [!TIP]
> Scope memories — per user, per agent, per session — and filter retrieval by scope. It keeps recall relevant and prevents one user's memories leaking into another's context (a real privacy bug).

## Pitfalls

- **Hoarding.** Persisting everything fills retrieval with noise; irrelevant retrieved memories actively degrade answers. Decide what's worth keeping.
- **Never forgetting.** Memory that never updates goes stale — new facts must supersede old, and contradictions reconciled, or the agent confidently recalls outdated truths.
- **No deletion path.** If you store user data, you need to be able to expire and delete it. Build that in from the start.
- **Memory as a dumping ground for bad retrieval.** If the *task's* knowledge belongs in a knowledge base, that's [RAG](/guides/concepts/how-rag-works), not agent memory. Memory is for the agent's own continuity, not your document corpus.

Once memory is in place, the other half of a capable agent is robust [tool calling](/guides/concepts/production-tool-calling) — and then [hardening it for production](/agents/meta-orchestration/agent-reliability-reviewer).

---

_Source: https://agentscamp.com/guides/concepts/agent-memory-architecture — Guide on AgentsCamp._