What 400+ Sessions Taught Me About Working with Claude Code

After hundreds of Claude Code sessions -- building production features, automating personal workflows, writing this entire blog series -- patterns emerge. Some things work reliably. Some things waste time reliably. I've shipped code, burned tokens, reverted disasters, and slowly built an intuition for what makes this tool sing versus what makes it stumble.

Here are the lessons that took the longest to learn, roughly in the order I wish someone had told me.

Where different Claude Code practices land on productivity versus quality. The sweet spot is high on both axes -- practices like plan mode and feedback loops. Quick wins boost speed without much quality risk. Over-engineered approaches add quality overhead with diminishing returns on throughput.

Context Beats Prompts

The biggest misconception about Claude Code is that output quality depends on prompt quality. People spend twenty minutes crafting the perfect prompt, agonizing over word choice, adding qualifiers and constraints and examples. They're optimizing the wrong thing.

Output quality depends on context quality.

A mediocre prompt with a great CLAUDE.md, relevant memory from claude-mem, and a well-organized codebase produces better results than a masterfully crafted prompt with no context. I've tested this repeatedly. A lazy one-liner like "add dark mode to the settings page" in a project with a thorough CLAUDE.md and good file structure will outperform a three-paragraph prompt in a project where Claude has to guess at every convention.

This is counterintuitive if you come from the ChatGPT world, where prompt engineering is treated like a discipline unto itself. In Claude Code, the prompt is just the last mile. Context is the highway. Your CLAUDE.md, your codebase structure, your memory layer, your subdirectory instructions -- that infrastructure does 80% of the work. The prompt fills in the remaining 20%.

Invest your time in CLAUDE.md, not in prompt engineering. I covered this in detail earlier in the series, but it bears repeating because it's the single most important lesson. Every minute spent improving your CLAUDE.md pays dividends across every future session. Every minute spent polishing a single prompt pays off exactly once.

Always Give Claude a Feedback Loop

Claude's output quality roughly doubles when it can verify its own work. This is not an exaggeration. It's the difference between a first draft and a finished product.

Writing code? Make it run the tests. Building UI? Make it check the browser via Playwright MCP. Generating a config? Make it validate the schema. Writing an MDX blog post? Make it run the build to catch compilation errors. The pattern is always the same: don't just write, write AND verify.

Claude without a feedback loop is a student writing an essay without proofreading. It'll produce something that looks right. It'll be confident about it. And there will be mistakes that ten seconds of verification would have caught. Missing imports. Off-by-one errors. A function that handles the happy path beautifully and throws on null input. Things that Claude would catch immediately if it ran the code, but misses when it's just writing into the void.

The practical implementation is straightforward. Put your build and test commands in CLAUDE.md. Tell Claude to run them after making changes. If you have Playwright set up, tell Claude to screenshot the page after UI changes. If you're working with APIs, tell Claude to hit the endpoint after implementing it. Close the loop.

The sessions where I forget to close the loop -- where I let Claude write code without running it -- are consistently the sessions where I find bugs in review. The sessions where Claude runs the tests after every change are the ones where the output is clean on the first pass. The correlation is strong enough that I now treat "did I give Claude a way to verify this?" as a pre-flight checklist item.

Plan Mode Is Not Optional

For anything touching more than two files, planning before executing is strictly better. Every time. No exceptions. The math is simple: planning takes 30 seconds and prevents 10 minutes of undoing.

The most expensive Claude Code session is one where Claude builds the wrong thing for 15 minutes because you skipped the plan. It reads the prompt, makes an assumption about the approach, and starts implementing. It edits five files. It creates two new ones. It wires up imports. And then you look at the diff and realize it went in completely the wrong direction. Now you're either reverting everything or, worse, trying to salvage it by redirecting mid-stream, which produces Frankenstein code that nobody would have designed on purpose.

Plan mode costs almost nothing. Claude reads the relevant files, thinks about the approach, and writes out what it intends to do. You read the plan. If it's right, you let it execute. If it's wrong, you redirect before a single line of code is written. The cost is thirty seconds of reading a plan. The benefit is never building the wrong thing.

I was slow to adopt plan mode because it felt like unnecessary overhead. I was wrong. Now I use it for anything non-trivial, and the quality of the output is measurably better. Not because planning makes Claude smarter. Because it catches wrong assumptions before they become wrong implementations.

One Task Per Session

Sessions are cheap. Context is expensive. This is the lesson I wish I'd internalized from day one.

The failure mode looks like this: you start a session to fix a bug. Claude fixes it. Great. While you're here, you ask Claude to refactor that module you've been meaning to clean up. It does. Then you notice some tests are out of date, so you ask Claude to update them. Then you want to add a feature to the thing you just refactored. Four tasks deep, Claude's context is polluted with the artifacts of three previous tasks. Its attention is split. It's holding state from the bug fix, the refactor, the test updates, and now the new feature, all competing for context window space.

The result is sloppy work on task four that would have been clean if it were task one. Context degradation is real. It's not dramatic -- Claude doesn't suddenly become incompetent -- but the quality drifts downward as the session gets longer and more cluttered. Small mistakes creep in. Claude forgets a convention it followed perfectly at the start of the session. It references a file it read during task two that's no longer relevant.

The fix is simple. One task per session. Fix the bug, commit, new session. Refactor the module, commit, new session. Update the tests, commit, new session. Each task gets a fresh context with full capacity. The commit boundary is the natural session boundary.

This felt wasteful at first. Starting a new session takes fifteen seconds. But those fifteen seconds buy you a clean context every time. The alternative is spending fifteen minutes debugging quality issues that stem from context pollution. The math is not close.

Don't Trust, Verify

Claude is confident. This is both its greatest strength and its most dangerous quality.

It writes plausible-looking code that compiles. It explains its approach clearly and logically. It sounds right. And sometimes it does the wrong thing, in a way that's subtle enough to survive a casual glance at the diff. A function that works for the test cases but fails on edge cases. A SQL query that returns the right results on your dev data but has a performance cliff at scale. A component that renders correctly but has an accessibility issue.

I've caught bugs in Claude's output that I would have missed entirely if I hadn't been reading the diffs carefully. Not because Claude is bad at coding. Because Claude is good enough at coding that its mistakes look like intentional choices. The code doesn't smell wrong. It looks clean, well-structured, properly typed. And it has a bug.

The workflow that works for me: always read the diffs. Always run the build. Always check the edge cases Claude didn't mention. If Claude says "this handles all the cases," ask yourself which cases it might have missed. If Claude refactored something, verify the behavior didn't change. If Claude wrote a test, check that the test is actually testing what it claims to test.

The best mental model treats Claude's output as a first draft from a skilled colleague, not a final product from an infallible machine. You'd review a colleague's PR before merging it. Give Claude's output the same treatment.

Brainstorming Is the Highest-ROI Step

Two minutes of brainstorming prevents thirty minutes of building the wrong thing. This ratio has held up consistently across hundreds of sessions.

I use the Superpowers /brainstorm command before any significant piece of work. It forces two things to happen. First, I have to articulate what I actually want. "Make the settings page better" becomes "add dark mode toggle, persist the preference to localStorage, and update all CSS variables." The act of explaining the goal to Claude exposes the vagueness in my own thinking.

Second, Claude asks the questions I didn't think of. "Should the dark mode preference sync across devices?" "Should there be a system-preference option?" "Do you want a transition animation when toggling?" These aren't hypothetical examples. These are actual questions Claude asked during a real brainstorming session. Every one of them would have become a mid-implementation decision that interrupted my flow if I hadn't addressed them upfront.

Brainstorming is cheap. It's a two-minute conversation that produces a clear, shared understanding of what we're building. Skipping it is free in the moment and expensive in aggregate. Every time I skip brainstorming and jump straight to implementation, I regret it. Every time I spend two minutes brainstorming first, the implementation goes smoother. The pattern is monotonically clear at this point.

Skills Beat Prompts for Repeated Tasks

If you've typed the same kind of prompt three times, you should have written a skill after the first time.

A skill is a markdown file that defines a reusable workflow. It captures the exact instructions, the constraints, the output format, everything that makes the task work well. You write it once, and every future invocation uses those same battle-tested instructions.

The initial investment is ten minutes. The payoff is every future use where the instructions are perfect. No forgotten steps. No drifting requirements. No "oh, I forgot to mention that the output should be in this specific format."

Prompts drift. You write a prompt from memory, and it's slightly different every time. You forget a constraint you mentioned last time. You phrase something differently, and Claude interprets it differently. The output is inconsistent across sessions because the input is inconsistent.

Skills are stable. The instructions are the same every time. The output format is the same every time. The constraints are the same every time. You wrote them once when you were thinking clearly, and now they execute perfectly every time without you needing to remember anything.

I have skills for committing code, reviewing PRs, writing blog posts, tracking calories, generating grocery lists, planning dates, building presentations. Each one started as a prompt I typed more than twice. Each one is now a one-word invocation that does the right thing every time. The accumulated time savings are enormous.

Memory Compounds

The /reflect workflow feels like overhead until you've been using it for a month. After each session that involves a mistake or a non-obvious solution, I save a lesson to claude-mem. "LESSON: database connection pool exhaustion happens when you forget to close transactions in error handlers." "LESSON: the Zustand devtools middleware must be the outermost wrapper or state updates don't show." Small, specific notes about things that went wrong or things that were surprisingly tricky.

At first, this felt like busywork. Writing down what went wrong, categorizing it, saving it. An extra thirty seconds per session for no immediate payoff.

Then I started searching for lessons at the beginning of sessions. "LESSON database" before working on database code. "LESSON auth" before touching authentication. And I started finding exactly the mistakes I'd made before. Not vague reminders, but specific, actionable warnings about the exact pitfalls in my codebase.

Every lesson saved is a future mistake prevented. The ROI is invisible for the first two weeks and enormous by week four. By month two, the memory layer feels essential. It's the difference between making the same mistake twice and making it once, learning from it, and never making it again. Humans are supposed to learn from experience. claude-mem makes sure that learning actually persists instead of evaporating when you close the terminal.

The compounding effect is real. Each lesson makes the next session in that domain slightly better. Over hundreds of sessions, the accumulated knowledge base turns Claude from a general-purpose agent into something that understands my specific codebase, my specific pitfalls, my specific preferences. That's not something you get from a better model. It's something you build over time.

Know When to Keep Claude on a Leash

There's a trust spectrum in Claude Code, and using the wrong level of autonomy for a given task always costs time. Always. In both directions.

Too little autonomy wastes time because you're micromanaging things Claude handles perfectly well on its own. Too much autonomy wastes time (and money) because Claude goes off-track with no one to redirect it.

The spectrum looks like this:

Autocomplete -- zero trust. You're driving. Claude suggests completions. This is appropriate for trivial edits where you already know what you're typing.

Interactive mode -- moderate trust. You prompt, Claude works, you review each change. This is where most work belongs. The human feedback loop catches mistakes early and redirects before they compound.

Plan mode -- higher trust. Claude explores, plans, and proposes. You review the plan, then trust the execution. This is the sweet spot for multi-file changes where the approach matters more than the individual edits.

Ralph Loop -- full autonomy. Claude works unsupervised for hours, iterating until the task is complete. This is for well-scoped, clearly defined tasks with measurable completion criteria. Implementation from a locked spec. Migration across dozens of files. Comprehensive test coverage.

Most work belongs in interactive or plan mode. Ralph Loop is for the 10% of tasks that are precisely defined enough to run unsupervised. Using Ralph Loop on a vague task burns money. Using interactive mode on a mechanical migration burns time. The skill is matching the task to the trust level.

I got this wrong for months. I'd Ralph Loop tasks that should have been interactive, spending money on loops that went off-track. I'd interactively babysit tasks that were mechanical and well-defined, wasting my attention on work Claude could have done autonomously. Calibrating the trust level for each task is one of the most important skills in working with Claude Code, and it only comes from experience.

The Terminal as Life OS

Here's the lesson I didn't expect: Claude Code is as good at personal automation as it is at coding.

This sounds like a stretch until you try it. Claude Code with custom skills and MCP servers is a general-purpose agent that happens to be really good at code. But "general-purpose" means it can do anything you can express as a terminal workflow. And that turns out to be a lot of things.

I track my daily calories from the terminal. I generate grocery lists based on my meal plan. I plan dates -- restaurants, activities, logistics -- by describing what I want and letting Claude handle the research. I build presentations from an outline. I draft messages. I organize my week. I covered these lifestyle skills earlier in the series, and they remain some of my most-used workflows.

Each of these is a skill. A markdown file that defines the workflow. The initial setup takes ten minutes per skill. After that, it's a slash command that does the right thing every time. No app to open. No UI to navigate. Just a prompt and a result.

The unexpected insight is that the terminal, with Claude Code as the agent layer, becomes a unified interface for your digital life. Instead of switching between ten apps with ten different UIs and ten different mental models, you have one interface that handles all of them. The command line has always been the most powerful interface on your computer. Claude Code makes it the most accessible one too.

I'm not suggesting everyone should track their calories from the terminal. But if you're already living in Claude Code for development, extending it to personal automation is nearly free. The infrastructure is already there. The skills system is already there. The agent capabilities are already there. You're just pointing them at different problems.

Claude Code Makes Good Engineers Better, Not Bad Engineers Good

This is the most controversial lesson, and I'm going to state it plainly.

The engineers who benefit most from Claude Code are the ones who already understand architecture, debugging, and code quality. They use Claude to move faster. They know what good code looks like, so they can evaluate Claude's output. They know what questions to ask, so they can steer Claude effectively. They understand system design, so they can plan before executing. Claude amplifies their existing skills.

Engineers who don't understand what good code looks like can't tell when Claude's output is wrong. They accept the first draft without scrutiny because they can't evaluate it. They don't know to ask about edge cases because they don't know which edge cases matter. They don't plan because they don't understand why the architecture matters. Claude produces code for them, but the code has the same quality problems they'd produce themselves, just faster.

AI makes the floor higher. A junior engineer with Claude Code produces better code than a junior engineer without it. But it doesn't change the ceiling. The best engineers with Claude Code are still the ones with the deepest understanding of software. The tool accelerates execution, not judgment. And judgment is the bottleneck for quality.

This isn't an argument against using Claude Code as a learning tool. It's great for learning. But learning requires the discipline to understand the code Claude writes, not just accept it. The engineers who grow fastest with Claude Code are the ones who read every diff, ask "why did you do it this way?" and occasionally push back when the approach is wrong. The ones who use it as a copy-paste machine don't grow at all.

If you want to get better at engineering, use Claude Code to accelerate your learning. If you want to get better at shipping, use Claude Code to accelerate your execution. But don't confuse the two. Shipping code you don't understand is technical debt with a timer on it.

The Meta-Lesson

After 400+ sessions, the lesson behind all the lessons is this: the tool gets better as you build the infrastructure around it.

CLAUDE.md makes every session start with better context. Skills make every repeated task execute perfectly. Memory makes every domain-specific session benefit from past experience. Hooks customize behavior at a system level. Custom commands create shortcuts for complex workflows. Each piece makes every future session more productive.

The investment is front-loaded. Writing your first CLAUDE.md takes an hour. Building your first skill takes fifteen minutes. Setting up claude-mem takes ten minutes. Configuring your first hook takes twenty minutes. It feels like overhead. It feels like yak-shaving. It feels like you should just be writing code.

But the payoff is permanent. Once the infrastructure exists, it works automatically. Every session is better than it would have been without it. The improvements compound across hundreds of sessions. The developer who spends a week setting up their Claude Code infrastructure -- I shared my exact setup earlier in the series -- will outperform the developer who just types prompts for the next six months.

This is the same pattern as any good tooling investment. The developers who spend time configuring their editor, their shell, their git workflow -- they're slower for a week and faster forever. Claude Code is the same. The raw tool is useful. The configured tool is transformative.

Four hundred sessions in, I'm still finding ways to make it better. Still adding rules to CLAUDE.md when I catch new mistakes. Still building skills for tasks I repeat. Still saving lessons that will prevent future errors. The system improves continuously because the infrastructure supports continuous improvement.

That's the real lesson. Don't just use the tool. Build the system around it. The terminal is patient, and the compound interest is real.