Claude-Mem Gave Claude Code a Memory and It Changed Everything

The most frustrating thing about Claude Code in the early days wasn't the model quality. It wasn't the context window limits. It wasn't the cost. It was the amnesia.

Every single session starts from scratch. You open your terminal, type claude, and the agent has absolutely no memory of anything you've done together before. Yesterday you spent forty minutes debugging a tricky RLS policy in Supabase. You figured it out together. The agent understood your schema, your edge cases, your exact issue. Today? Gone. Completely gone. You're strangers again.

CLAUDE.md helps. I wrote about it earlier in this series. You can put project conventions, architectural decisions, and coding rules in a markdown file that gets loaded at the start of every session. That's great for static context. "Use Tailwind. Don't hardcode colors. Run the build after editing MDX files." That kind of thing works perfectly in CLAUDE.md.

But CLAUDE.md can't capture what happened. It can't know that session A discovered your API uses snake_case while your frontend uses camelCase and you decided to keep it that way. It can't remember that you tried three different approaches to the auth middleware before finding one that worked. It can't warn you that upgrading that particular dependency broke the build last time you tried it.

That's the gap. Static context versus dynamic, session-to-session knowledge. Claude-mem fills that gap.

The Groundhog Day Problem

Let me make the problem concrete.

Session A: You're building a new API endpoint. Claude Code reads through your existing endpoints, notices the snake_case convention, follows it perfectly. You also discover that your database connection pooling has a quirk where connections don't release properly unless you explicitly close them in a finally block. You and Claude figure this out after twenty minutes of debugging. Session ends.

Session B, the next morning: You're building another endpoint. Claude Code reads the same files and picks up the snake_case convention again, because it can see the existing code. Good. But then it writes a database query without the finally block cleanup. The same bug you already solved yesterday. You catch it, explain it again, fix it again. Twenty minutes you didn't need to spend.

Session C: Same thing. Different bug. Same category of problem. You've already learned this lesson. Claude hasn't.

Multiply this across weeks of daily use. Every debugging insight, every architectural decision, every "oh that's how this project works" moment evaporates at the end of each session. You're accumulating knowledge. Your tool isn't.

This is fundamentally different from working with a human colleague. When you pair-program with someone for a week, they remember what you worked on together. They build context. They learn your codebase alongside you. Claude Code, without memory, is like hiring a brilliant contractor who gets a full memory wipe every evening. This is why context engineering, not prompt engineering, is the real skill -- and memory is a critical part of that context infrastructure.

What Claude-Mem Is

Claude-mem is a plugin that gives Claude Code persistent, cross-session memory. It runs as an MCP server alongside your Claude Code sessions, capturing observations about what happens during your work, compressing them into dense semantic summaries using AI, and storing them in a local database. When you start a new session, it searches that database and injects relevant past context.

The result: Claude Code remembers. Not everything, and not perfectly. But the important stuff. The debugging insights, the architectural decisions, the lessons learned, the patterns discovered. It carries knowledge forward from session to session in a way that fundamentally changes how the tool works over time.

I want to be precise about what this is and isn't. It's not a complete session replay. It's not recording every keystroke. It's more like having a diligent colleague who takes notes during every pairing session, summarizes the key takeaways, and reviews those notes before the next session starts.

How It Works Under the Hood

The memory lifecycle: observations are captured during sessions, compressed and stored, then searched and applied in future sessions.

Claude-mem operates through Claude Code's lifecycle hooks. These are specific points in the session lifecycle where plugins can inject behavior. There are four that matter.

SessionStart. When you open a new Claude Code session, claude-mem searches its database for observations relevant to the current project, the files you're likely to work with, and the patterns you've established. It injects this context into the session, so Claude Code starts with awareness of what's happened before. This is the hook that makes past sessions visible to future ones.

UserPromptSubmit. Every time you type a prompt, claude-mem captures it. Not to store your raw text forever, but for pattern recognition. Over time, it builds an understanding of what you work on, what you ask about, and what areas of the codebase you focus on. This improves the relevance of what gets surfaced in future SessionStart injections.

PostToolUse. This is the heavy lifter. Every time Claude Code uses a tool, whether it reads a file, makes an edit, runs a bash command, or anything else, claude-mem observes the result. The raw tool output might be anywhere from 1,000 to 10,000 tokens. Claude-mem compresses that down to roughly 500-token semantic observations. Not a simple truncation. An AI-powered compression that extracts the meaning: what was done, why it mattered, what was learned.

If Claude Code reads a file and discovers an unexpected pattern, that gets captured. If it runs a test suite and three tests fail with a specific error, the error and its context get captured. If it makes an edit that fixes a bug, the nature of the bug and the fix get captured. Every meaningful tool interaction becomes a compressed observation.

Stop. When Claude Code finishes responding, claude-mem generates a session summary. This is the high-level "what happened this session" record. It ties together the individual observations into a coherent narrative of what was accomplished, what was attempted, and what was learned.

Each observation gets categorized by type: decision, bugfix, feature, refactor, discovery, or change. It gets tagged with relevant concepts and file references. And it gets stored in a local SQLite database with full-text search. The database lives on your machine. Nothing leaves your system.

The Layered Memory System

Here's where the design gets clever. Not all observations are equally valuable, and the value changes over time. What happened yesterday is highly relevant. What happened three months ago might be relevant, but you don't need the same level of detail.

Claude-mem implements a layered compression system. Recent observations stay detailed. As they age, they get progressively summarized. The specifics of what you debugged last Tuesday eventually compress into a higher-level observation like "the auth middleware has a known issue with token refresh timing that requires explicit cache invalidation."

This mirrors how human memory actually works. Your meeting from yesterday? You remember who said what, the specific decisions, the arguments for and against each option. A meeting from three months ago? You remember the outcome and maybe one or two key points. The detail fades, but the important knowledge persists.

This is critical for long-term viability. Without compression, the database would grow into an unmanageable dump. Every file read, every command output, every edit would pile up until searching it became slow and the results became noisy. The layered system keeps the database useful. Recent work is detailed. Older work is distilled. Ancient work is just the lessons.

The /reflect Workflow

This is, in my opinion, the single most valuable thing I've built on top of claude-mem. It's a custom workflow I run after commits, and it turns mistakes into persistent guardrails.

Here's how it works. After I commit code, the workflow triggers. It reviews the git diff of the recent commits and looks for specific patterns:

Wrong assumptions. Did the session assume something about the codebase, an API, or the requirements that turned out to be false? If Claude Code assumed an endpoint returned JSON when it actually returned XML, that's a wrong assumption worth capturing.

Wasted effort. Did we build something that had to be reverted, significantly simplified, or completely redone? If we spent thirty minutes implementing a complex caching layer only to realize the data source already handles caching, that's wasted effort.

Multiple failed attempts. Did a fix or implementation take three or more tries to get right? If we tried three different approaches to solve a CSS layout issue before finding one that worked, the winning approach is worth remembering.

Non-obvious root causes. Was a bug's cause surprising or counterintuitive? If a production error turned out to be caused by a timezone mismatch in a library that silently defaults to UTC, that's the kind of thing you want to remember forever.

Useful patterns discovered. Did we find a reusable approach worth remembering? If we discovered that a particular testing pattern catches edge cases that unit tests miss, that's institutional knowledge.

When any of these triggers fire, the workflow saves each lesson to claude-mem with a LESSON: prefix. The format is deliberate. Something like:

"LESSON: Always check RLS policies before adding new Supabase tables - row-level security silently blocks all access by default."

"LESSON: The sharp-image library silently downgrades quality when given HEIC inputs. Convert to PNG first."

"LESSON: When upgrading Next.js, check middleware compatibility first. The middleware API changes between major versions without deprecation warnings."

These are specific, actionable, and born from actual pain. They're not theoretical best practices from a style guide. They're lessons learned from real sessions where something went wrong.

Here's the payoff. At the start of a new session, when I'm about to work in a specific domain, I search for relevant lessons:

search("LESSON supabase")
search("LESSON frontend auth")
search("LESSON next.js upgrade")

Past mistakes become future guardrails. The lesson about RLS policies surfaces before I create a new table, not after I spend twenty minutes wondering why the insert isn't working. The lesson about the middleware API surfaces before I start the upgrade, not after the build breaks.

This is the closest I've gotten to making Claude Code genuinely learn from experience. Not in a general AI training sense, but in a practical "this specific project has these specific pitfalls" sense.

Searching Past Sessions

Beyond the automatic context injection at session start, claude-mem gives you on-demand access to your history. The search capability is one of those features that sounds simple but changes how you work.

"Did we already solve this?" Seriously, this comes up constantly. You're looking at a bug and you have a nagging feeling you've seen it before. Instead of trying to remember which session it was, you search. If claude-mem has it, you get the observation with context. If not, at least you know you're looking at something new.

"How did we handle auth last time?" When you're building a new feature that touches authentication, you can pull up every observation tagged with auth-related concepts. Not just what you built, but the decisions you made and why you made them.

"What broke when we upgraded Next.js?" Before you start any risky operation, search for past observations about it. If previous sessions captured problems with a particular upgrade or migration, you'll know about them before you start, not after something breaks.

The search is full-text over the SQLite database. It's fast, it's local, and it's private. You can search by concept, by file reference, by observation type, or by plain keywords. The results come back ranked by relevance, and recent observations rank higher than old ones.

I find myself searching claude-mem before starting almost any non-trivial task now. It takes seconds, and the number of times it's surfaced something useful that I'd completely forgotten is enough to justify the entire plugin.

MEMORY.md: The Auto-Loaded Index

Claude-mem maintains a MEMORY.md file that gets automatically loaded as context. Think of it as the summary layer on top of the database. It contains the most recent and relevant observations in a format that's easy for Claude Code to parse at session start.

There's a practical constraint here: lines after 200 get truncated. This is intentional. The file needs to stay concise enough to fit into the context window without eating too much of the token budget. If MEMORY.md grew unbounded, it would crowd out the actual context you need for the current task.

The approach I've settled on is treating MEMORY.md as an index. It contains high-level summaries and pointers to more detailed topic-specific files. If I have a lot of accumulated knowledge about, say, debugging patterns in my project, that goes in a separate file that I can reference when needed. MEMORY.md just has a line pointing to it.

Think of it this way: MEMORY.md is the table of contents. The topic files are the chapters. The SQLite database is the unabridged version. You interact with whatever layer of detail you need for the current task.

Keeping MEMORY.md under 200 lines requires discipline. I periodically review it and prune observations that are no longer relevant. Finished features get compressed into single-line summaries. Resolved bugs get removed unless the lesson is still valuable. It's maintenance, but it's low-effort maintenance that keeps the system useful.

What Claude-Mem Doesn't Do

I want to be honest about the limitations because overselling tools is a disease in this industry.

Compression loses detail. When a 5,000-token tool output gets compressed to 500 tokens, information is lost. That's the point, and the AI compression does a good job of keeping what matters, but it's not lossless. If you need the exact error message from a session two weeks ago, the compressed observation might not have it. For critical details, I still sometimes manually note things in CLAUDE.md or project documentation.

Very old observations may not surface. The search is good, but it's not omniscient. If you're looking for something from months ago and you don't remember the right keywords, it might not come up. The layered compression means old observations are highly summarized, which means there's less text for the search to match against. This is a tradeoff. You could keep everything at full detail, but then search results would be noisy and the database would be enormous.

It's not a replacement for CLAUDE.md. This is important. CLAUDE.md is for static project configuration. Your coding conventions, your architecture rules, your build commands. These don't change between sessions. They should live in CLAUDE.md where they're guaranteed to be loaded every time. Claude-mem handles the dynamic layer, the things that change session to session. What you worked on, what you learned, what broke. Both are necessary. Neither replaces the other.

It requires a running MCP server. Claude-mem runs as an MCP server alongside your Claude Code sessions. If it's not configured, there's no memory. If the server crashes mid-session, observations from that session might not be captured. In practice this is rare, but it's worth knowing.

It doesn't make Claude Code smarter. Claude-mem gives Claude Code access to more context. That's powerful, but it doesn't change the underlying model's capabilities. If the model would make a mistake even with the right context, memory won't help. It's an information problem solver, not an intelligence amplifier.

The Compound Effect

Here's what surprised me most about using claude-mem over months: the value compounds.

In week one, the memory database is sparse. There's not much to inject at session start. The tool helps, but modestly.

By week four, there's a meaningful collection of observations. Common patterns are captured. Key debugging insights are saved. The agent starts sessions with actual awareness of project history.

By month three, it's a different experience entirely. I start a session, and Claude Code already knows the tricky parts of my codebase. It knows which dependencies have weird edge cases. It knows which architectural decisions I made and why. It knows what patterns to follow because it can see the observations from dozens of sessions that followed those patterns.

The compounding happens because each session generates observations that improve future sessions, which generate better observations, which improve future sessions even more. It's a positive feedback loop. The tool gets better at helping you as you use it more, not because the model improves, but because the context improves.

This is fundamentally different from a tool that performs the same regardless of how long you've used it. Claude Code without memory is a stateless tool. Claude Code with claude-mem is a tool that learns your project. The difference, over time, is enormous.

Setting It Up

Claude-mem runs as an MCP server plugin. The setup involves adding it to your Claude Code configuration, pointing it at a database location, and letting it run. The database is local SQLite. No cloud dependency, no data leaving your machine.

Once configured, it works automatically. You don't need to manually save observations or tag things or manage the database. The lifecycle hooks handle capture. The AI handles compression. The search handles retrieval. Your job is just to work normally and occasionally search when you need past context.

The /reflect workflow I described earlier is an addition I built on top. It's not part of claude-mem itself, it's a custom command that uses claude-mem's save_observation tool. You can use claude-mem without it. But I think the combination is where the real value is. Automatic observation capture gives you breadth. Deliberate lesson capture gives you depth.

Why This Matters Beyond Tooling

The best tools don't just help you work. They help you learn from your work.

Every software project accumulates institutional knowledge. Why was this architecture chosen? What alternatives were considered? What broke the last time someone changed this module? In most teams, this knowledge lives in people's heads. When they leave, it leaves with them. In a solo developer workflow, it lives in your head. When you context-switch to another project for a month and come back, it's gone.

Claude-mem externalizes that knowledge. It turns ephemeral session insights into persistent, searchable records. It's not just a productivity tool. It's an institutional memory system for your development practice.

I've been using it daily for months now, and I show how it integrates into my complete feature workflow as the critical first step: memory recall before coding. I can't imagine going back to stateless sessions. The groundhog day problem is real, and once you've experienced sessions that start with actual context about your project's history, starting from zero feels like a regression.

If you're using Claude Code without any form of persistent memory, you're working harder than you need to. Every session where you re-explain something you've already figured out is wasted time. Every bug you debug twice is wasted effort. Claude-mem doesn't eliminate all of that, but it eliminates enough of it that the difference is immediately noticeable.

Give it a week. Let the observations accumulate. Search for a lesson when you're about to tackle something tricky. You'll wonder how you ever worked without it.