Plan Mode, Context Windows, and Not Wasting Tokens

The first time my Claude Code session degraded mid-task, I had no idea what was happening. I was deep into a refactor, maybe forty minutes in. Claude had been sharp the whole time. Reading files accurately, making surgical edits, keeping track of the overall plan. Then something shifted. The responses got vague. It started suggesting changes to files it had already modified. It forgot a constraint I'd given it ten minutes earlier. I thought something was broken. Turns out, something was. I'd run out of context.

It took me an embarrassing amount of time to understand what was actually going on. Once I did, it changed how I use Claude Code entirely.

The Context Window, Explained

Claude Code operates within a 200k token context window. Think of it as a shared workspace. Everything that happens in your session lives inside that workspace: your CLAUDE.md files that load at the start, every prompt you type, every response Claude generates, every file it reads, every tool output, every bash command result, every search result. All of it competes for the same 200,000 tokens.

To put that in perspective, 200k tokens is roughly 150,000 words, or about the length of two novels. That sounds like a lot. It is not.

A single large source file can be 2,000-3,000 tokens. A directory listing of a big project might be 500. Claude's own responses, especially when they include code, burn through tokens fast. A typical detailed response with code examples runs 800-1,500 tokens. When Claude reads a file, formats a plan, generates edits, and then you ask follow-up questions, the context fills up much faster than you'd expect.

When the context window is full, Claude auto-compacts. It summarizes older messages to free up space. The conversation continues, but the summarization is lossy. Details disappear. Specific code snippets get reduced to high-level descriptions. Nuanced decisions get flattened into generic summaries. The session keeps working, but it's working with a degraded version of its own memory.

Why This Matters Practically

Here's a scenario I've lived through more than once.

You're 80% through a complex refactor. You've established an architectural decision early in the session: "We're going to use the adapter pattern here because the existing interfaces are incompatible and we don't want to modify the upstream code." Claude understood this. It's been implementing accordingly for thirty minutes. Then the context compacts.

After compaction, Claude might not remember why you chose the adapter pattern. It might suggest modifying the upstream code directly, the exact thing you decided against. It might re-read files it already processed, burning more context on information it already had. It might contradict its own earlier plan because the plan was summarized into something too vague to be useful.

Context is Claude's working memory. This is why I wrote about context engineering being the real skill, not prompt engineering. When the window fills up and compaction kicks in, it's like someone replaced your whiteboard notes with a one-paragraph summary while you were getting coffee. The broad strokes are there. The specifics are gone. And in software engineering, the specifics are everything.

I've seen sessions where Claude generated a perfectly good plan, executed the first four steps flawlessly, then after compaction, repeated step two because it couldn't remember doing it. I've seen it introduce a bug that contradicted a constraint it had been respecting for the entire session. These aren't Claude being dumb. These are context management failures.

Plan Mode: The Single Most Important Habit

If you take one thing from this post, make it this: use plan mode.

The plan mode workflow: explore first, plan second, execute only after approval.

When you enter plan mode with the shift+tab shortcut or the plan mode toggle, Claude's behavior changes fundamentally. Instead of immediately editing files, it explores the codebase first. It reads files, searches for patterns, traces dependencies, and maps out the landscape. Then it formulates a plan and presents it to you. It doesn't touch anything until you approve.

This matters for four reasons.

It prevents wasted context on wrong approaches. Without plan mode, Claude might read three files, start editing the first one, realize it needs a different approach, undo the edit, read two more files, and try again. Every one of those steps consumed tokens. With plan mode, Claude does the exploration upfront and presents you with a coherent approach before spending context on edits. If the approach is wrong, you course-correct before any tokens are spent on implementation.

It forces Claude to think before acting. There's a meaningful quality difference between "Claude reads a file and immediately starts editing" and "Claude reads five files, considers the relationships, and then proposes a plan." The plan mode workflow produces better architectural decisions because Claude has more information when it makes them. It's the same reason you'd want an engineer to read the existing code before writing new code.

It gives you a checkpoint to correct course. When Claude presents a plan, you can review it. You can say "Step 3 won't work because of X" or "You're missing the constraint that Y." These corrections happen before any implementation context is spent. A two-sentence correction at the plan stage replaces a twenty-message back-and-forth during implementation.

The plan itself serves as an anchor. Once Claude writes out a plan, that plan exists in the context window as a reference. As the session progresses, Claude can look back at the plan to stay on track. This is especially valuable for multi-step tasks where it's easy to lose the thread. The plan acts as a persistent set of instructions within the session itself.

I use plan mode for anything that touches more than two files. The thirty seconds it takes to review a plan saves ten minutes of undoing bad edits. That's not an exaggeration. I've timed it.

For small, isolated changes, a single file fix or a quick addition, going straight to implementation is fine. But the moment a task involves coordination across files, architectural decisions, or any non-trivial refactoring, plan mode is mandatory. No exceptions.

Context Management Commands

Claude Code gives you three commands for managing context. Learn them. Use them.

/context -- Check your current token usage. This is your fuel gauge. I check it obsessively. Before starting a new subtask, I glance at context usage. If you're above 70%, you need to start thinking about your next move. Above 85%, you should be compacting or wrapping up the current task. Above 95%, you're in the danger zone where auto-compaction can kick in at the worst possible moment.

The specific numbers will vary depending on what you're doing. Heavy file reading burns context faster than conversation. But the principle is the same: know where you stand.

/compact -- Manually summarize the conversation. This is the controlled version of what auto-compaction does automatically. The difference is timing. When you compact manually, you choose the moment. You can compact after finishing a subtask but before starting the next one. You can compact after the planning phase but before implementation. You control what gets summarized and when.

Manual compaction is almost always better than auto-compaction because you're compacting at a natural boundary. The summary preserves the current state more accurately when it happens between tasks rather than in the middle of one.

You can also pass a message with compact to tell Claude what to focus on preserving. Something like /compact focus on the migration plan and the three remaining files helps Claude generate a summary that keeps the important details intact.

/clear -- The nuclear option. This wipes the entire session. The context resets to zero. CLAUDE.md files reload, but everything else is gone. Use this when you're starting a completely new task in the same project. Don't use it when you're mid-task, that's what compact is for.

I use /clear between unrelated tasks. If I just finished writing tests and now I want to work on a totally different feature, I clear. Starting fresh is better than carrying the baggage of an irrelevant conversation.

Strategies That Actually Work

Over the past several months of using Claude Code as my primary development tool, I've settled on a set of habits that keep sessions productive. None of these are clever. They're all obvious in retrospect. But I didn't start with them, and I wasted a lot of time before I figured them out.

One task per session. Don't ask Claude to fix a bug, then refactor a module, then write tests, then update documentation, all in the same session. Each of those tasks loads different files, generates different context, and requires different mental models. By the time you get to the tests, Claude's context is bloated with bug fix details it no longer needs. Start a new session for each distinct task. It's free. There's no reason not to.

Start with plan mode, then execute. For any non-trivial task, the workflow is: enter plan mode, let Claude explore and propose a plan, review and adjust the plan, then switch to implementation. The plan anchors the session. When context gets tight later, the plan is there as a reference point. This single habit probably saved me more time than everything else combined.

Compact proactively at 70-80% usage, not reactively at 95%. If you wait until context is nearly full, auto-compaction will beat you to it, and auto-compaction doesn't care that you're in the middle of a critical edit. Compacting at 70-80% gives you a clean slate for the remaining work while the session state is still coherent.

Use CLAUDE.md to front-load context. Anything you find yourself repeating across sessions belongs in a CLAUDE.md file. Project conventions, architectural decisions, naming patterns, testing requirements. These load automatically at the start of every session. They cost tokens, yes, but they're tokens well spent because they prevent the back-and-forth of re-explaining things Claude should already know.

I keep my CLAUDE.md files lean. They're not documentation. They're instructions. Short, direct statements that Claude can reference without burning through paragraphs of prose. Think of them as the briefing before the mission, not the mission report.

Break big tasks into phases. This is exactly what the GSD methodology formalizes with its phase planning and execution commands. If a task is too large for one session, don't try to force it. Finish phase 1, commit the code, write a brief summary of what was done and what's next, then start a new session for phase 2. The commit preserves the work. The summary (which you can put in CLAUDE.md or just paste into the new session) provides continuity. This approach is more reliable than trying to stretch one session across a massive change.

Be specific in your prompts. Vague prompts generate vague plans that require more back-and-forth, which burns more context. "Fix the auth bug" is worse than "The login endpoint returns 401 when the JWT is valid but expired. The refresh logic in src/auth/middleware.ts isn't triggering. Fix the expiry check." The second prompt lets Claude go straight to the relevant file instead of exploring the entire auth system.

What Auto-Compaction Actually Does

It's worth understanding the mechanics, because they explain why manual compaction is almost always preferable.

When the context window fills up, Claude automatically summarizes older messages to free tokens. The summary is generated by the model itself, so it's reasonably intelligent. It preserves the high-level intent of the conversation: what task you're working on, what the general approach is, what the current state of things looks like.

But it loses things. Specific code snippets get dropped. Exact error messages disappear. The reasoning behind a particular decision, "We chose approach A over approach B because of edge case X," gets compressed into "We're using approach A." The nuance evaporates.

This is particularly dangerous during implementation. If Claude is in the middle of editing file three of five and auto-compaction fires, it might lose the details of what it did in files one and two. It still knows it edited them, but it might not remember the specific changes. If file four depends on knowing exactly what changed in file two, you have a problem.

Manual compaction at the right time beats auto-compaction at the wrong time. Always.

The best time to compact is at a natural boundary: after finishing a subtask, after the planning phase, after a commit. At these points, the session state is clean. The important information is the current plan and the current state of the code, both of which survive summarization well. The expendable information is the step-by-step conversation that got you here, which is exactly what compaction is designed to drop.

The Boring Fundamental

Context management is not exciting. Nobody's going to make a viral post about checking their token usage before starting a new subtask. There's no drama in hitting /compact after finishing a planning phase.

But it's the difference between sessions that stay sharp for an hour and sessions that degrade after twenty minutes. It's the difference between Claude remembering your architectural constraints and Claude suggesting the exact thing you told it not to do. It's the difference between finishing a refactor in one session and having to restart because the context got corrupted.

I think about context management like memory management in C. You can ignore it and things will probably work for small programs. But the moment your program gets complex, the moment you're managing real state across real operations, you either manage memory deliberately or you get bugs that are nearly impossible to diagnose. Context management in Claude Code is the same. The tool is powerful enough that casual usage works fine for simple tasks. But for the work that actually matters, the complex refactors, the multi-file features, the architectural changes, context management is what determines whether the session succeeds or fails.

Plan mode, proactive compaction, one task per session. These are the boring fundamentals. They work. I revisit these themes in my 400+ sessions retrospective, where they rank among the most important lessons I've learned.