Subagents and Parallel Execution: Making Claude Code 5x Faster

The default Claude Code workflow is sequential, and for many tasks that's fine -- I covered the basics of this interactive workflow in the first week of this series. Read a file. Think about it. Edit it. Read the next file. Think about that. Edit that too. For single-file changes, this is fine. Fix a typo, rename a variable, add a loading spinner -- sequential is fast enough that you don't notice the bottleneck.

But the moment your task touches multiple independent things, sequential falls apart. Need to understand how three different subsystems interact? That's three sequential research passes, each waiting for the last to finish. Need to implement five independent components? That's five sequential implementation cycles. Need to write tests for four different modules? Four sequential rounds of reading, writing, and verifying.

The wall-clock time scales linearly with the number of subtasks. Five independent things take five times as long, even though they have no dependencies on each other. You're watching a single worker do five jobs in a row when five workers could have done them all at once.

Subagents fix this. They let Claude Code spawn parallel workers -- autonomous agents that each handle a piece of the task simultaneously. The parent coordinates. The workers execute. And what took 25 minutes sequentially finishes in 5.

The fan-out/fan-in pattern of subagent execution. The leader decomposes the task, spawns parallel workers, and merges their results.

What Subagents Actually Are

When Claude Code encounters a task that can be parallelized, it spawns autonomous worker agents via the Task tool. Each agent gets its own context window, its own set of tool permissions, and works independently of the others. The parent agent is the coordinator. It decides what to delegate, constructs the instructions for each worker, launches them, waits for results, and synthesizes the output.

This is not multithreading in the traditional sense. Each subagent is a separate invocation of the model with its own conversation context. They don't share memory. They don't communicate with each other directly. They receive instructions from the parent, do their work, and return results. The parent is the only one that sees the full picture.

Think of it like a manager delegating to a team. The manager doesn't do the work. The manager decides who does what, provides each person with the context they need, collects the deliverables, and assembles them into a coherent whole. The quality of the output depends heavily on how well the manager decomposes the task and how clearly the instructions are written. Claude Code handles this decomposition automatically, but understanding how it works lets you guide it toward better parallel strategies.

The Task tool call includes a description of what the subagent should do, what tools it's allowed to use, and any relevant context from the parent's conversation. When the subagent finishes, its output flows back to the parent as a single message. The parent never sees the subagent's intermediate steps -- just the final result. This keeps the parent's context window clean and focused on coordination rather than execution details.

Agent Types: Match the Worker to the Job

Not all subagents are created equal. Different tasks need different capabilities, and using the right agent type for the job matters for both speed and cost.

Explore agents are fast and read-only. They can glob for files, grep for patterns, and read file contents, but they cannot edit anything. They cannot run bash commands. They cannot write files. This constraint is the point. When you need to research a part of the codebase -- understand how a module works, find where a function is used, trace a dependency chain -- you want an agent that reads quickly and reports back. Explore agents are cheaper because they use fewer tool calls and can't accidentally break anything. Use them for investigation, research, and understanding.

Plan agents are also read-only but focused on architecture and strategy. They read the codebase, analyze the problem, and return an implementation plan -- which files to modify, what changes to make, what order to do them in. They don't touch anything. They think and report. Use them when you need a strategy before you start building.

General-purpose agents have the full toolkit. File editing, bash commands, file writing -- everything the parent can do. These are the workers you spawn when actual implementation needs to happen. They can create files, modify existing code, run tests, install packages, and do anything else a normal Claude Code session can do. Use them for building, not for researching.

Beyond these three core types, there are specialized agents tuned for specific review tasks -- code-reviewer, silent-failure-hunter, type-design-analyzer, pr-test-analyzer, comment-analyzer -- which I covered in the PR Review Toolkit post. Those are special-purpose agents designed for a narrow class of analysis work.

The key insight is to match the agent type to the task. Don't spawn a general-purpose agent for a read-only research task. Use Explore. It starts faster, costs less, and can't accidentally modify files while it's supposed to be just reading. Don't use an Explore agent when you need to edit files. It will fail and waste time. The type system exists for a reason. Use it.

Foreground vs. Background Execution

Subagents can run in two modes: foreground and background.

Foreground agents block the parent until they return. The parent sends the task, waits for the result, and then continues. This is the default. Use foreground when the parent needs the subagent's output to proceed. If you're spawning a research agent to understand a module before writing code that depends on that module, the research has to finish first. Foreground is the right choice.

Background agents run independently. The parent spawns them and continues working on other things. When the background agent finishes, the parent gets notified. Use background when you have genuinely independent work that doesn't gate anything else. If you're implementing a feature and simultaneously want documentation written, the doc-writing agent can run in the background while the parent focuses on implementation.

The distinction matters for throughput. If you have five independent tasks and run them all as foreground agents sequentially, you get no parallelism benefit. You need to launch them and let them run concurrently. The Task tool handles this -- when the parent launches multiple subagents before waiting for any of them, they execute in parallel.

In practice, most parallelization uses foreground agents launched together. The parent fires off five tasks, then waits for all five to return. Each one runs independently, and the parent collects results as they come in. Background agents are more useful for long-running tasks that the parent doesn't need to coordinate with -- things like generating documentation, running comprehensive test suites, or doing codebase-wide analysis while the main work continues.

Git Worktree Isolation: The Real Superpower

Here's where things get interesting. The biggest challenge with parallel agents editing files is conflicts. If two agents try to modify the same file simultaneously, you get a mess. If agent A reads a file, agent B modifies that same file, and then agent A writes its changes, agent B's work gets overwritten.

Git worktree isolation solves this completely.

When you spawn a subagent with isolation: "worktree", Claude Code creates a separate git worktree for that agent. A git worktree is a full, independent copy of your repository at a specific commit, with its own working directory and its own branch. The agent works in its own isolated copy. It can edit any file it wants without affecting the parent's working directory or any other agent's working directory.

Each agent gets its own branch. Agent A works on branch-agent-a. Agent B works on branch-agent-b. They can both modify src/components/Header.tsx if they need to, and there's no conflict because they're editing different copies. When the agents finish, their branches can be merged back into the main branch, and standard git merge resolution handles any overlapping changes.

This is how you implement five independent components in parallel without anyone stepping on anyone else's toes. Each agent has a complete, isolated environment. They can run tests. They can build the project. They can do anything a normal development session can do, all happening simultaneously in separate worktrees.

The overhead of creating a worktree is minimal -- a few seconds. The benefit is that you can parallelize work that would otherwise be impossible to parallelize due to file conflicts. For any task where multiple agents need to edit files, worktree isolation is not optional. It's required.

When the agents finish, the cleanup is automatic. The worktree is either merged into the parent branch or discarded, depending on the outcome. You don't need to manage branches or worktrees manually. The orchestration handles it.

A Real Example: Writing Five Blog Posts in Parallel

Let me describe something concrete. This blog series -- the Month of Claude Code -- is 20 posts. Writing them sequentially would mean writing one post, finishing it, starting the next, finishing that, and so on. Each post takes around five minutes of Claude's time when given a detailed outline.

Instead, the parent agent received outlines for five posts. It spawned five general-purpose subagents, each given a specific outline and a specific output file path. Post 1 writes to post-one.mdx. Post 2 writes to post-two.mdx. And so on. No file conflicts because each agent writes to a unique path.

All five agents ran simultaneously. Each one researched the topic (reading relevant files from the codebase for technical accuracy), wrote the post, and returned it. The parent collected all five results and verified consistency -- making sure cross-references between posts were accurate, terminology was consistent, and no two posts covered the same ground.

Five posts that would take 25 minutes sequentially finished in about 5 minutes of wall-clock time. Same total token cost. Same quality. One-fifth the wait.

Here's another example: codebase research. I needed to understand how three different subsystems interacted -- the blog rendering pipeline, the project showcase system, and the OS-style window manager. These are independent modules with minimal overlap. I spawned three Explore agents in parallel, each investigating one subsystem. Each agent traced the component tree, identified the key files, mapped the data flow, and summarized how the subsystem worked.

Three research reports came back simultaneously instead of sequentially. The parent synthesized them into a unified understanding of how the three systems relate. Total time: about 90 seconds. Sequential time: about four and a half minutes. Not a dramatic absolute saving, but a dramatic relative one, and the pattern scales. Ten subsystems? Ten parallel agents. The wall-clock time stays roughly constant.

When to Parallelize and When Not To

The rule is straightforward: if tasks are independent, parallelize them. If task B needs the output of task A, run them sequentially.

Parallelize these:

Researching different files or modules
Implementing components that don't share files
Writing tests for independent modules
Reviewing different aspects of a PR (this is exactly what the PR Review Toolkit does)
Generating documentation for separate features
Running analysis on unrelated parts of the codebase

Keep these sequential:

Plan, then implement (implementation depends on the plan)
Implement, then test (tests need the implementation to exist)
Read configuration, then use the configuration values
Create a schema, then write code that depends on that schema
Any task where the output of step N is the input to step N+1

The mistake I see people make is trying to parallelize things that have hidden dependencies. "Implement the API endpoint and the frontend component in parallel" sounds reasonable until the frontend component needs to know the exact shape of the API response. If you parallelize that, either one agent guesses at the interface (and gets it wrong) or you add a synchronization step that defeats the purpose of parallelizing.

The safe pattern is: parallelize research, parallelize independent implementation, but keep plan-then-implement and implement-then-integrate sequential. When in doubt, sequential is always correct. It's just slower.

Cost and Performance Trade-offs

A question I get asked: do subagents cost more?

The token cost is the same. Five parallel agents processing 1,000 tokens each costs the same as one agent processing 5,000 tokens total. You're not paying a parallelism premium. The tokens are the tokens regardless of how many agents consume them.

What you gain is wall-clock time. Five agents running in parallel finish in roughly one-fifth the time of one agent doing all five tasks sequentially. In practice, the speedup isn't perfectly linear -- there's overhead for spawning agents, serializing context, and merging results -- but for tasks that take more than a few seconds each, the speedup is substantial. 3-5x faster is typical.

You can also optimize cost by using different models for different subagents. Simple tasks -- file reading, pattern matching, straightforward text generation -- can use a smaller, faster, cheaper model like Haiku. Complex tasks -- architectural analysis, nuanced code generation, subtle bug detection -- should use the full Opus model. This lets you allocate your token budget where it matters most.

A research agent that just needs to find files and summarize their contents doesn't need Opus-level reasoning. Haiku handles that fine at a fraction of the cost. But a code review agent that needs to detect subtle type safety issues or silent failure patterns benefits from Opus's deeper analysis. Mixing models across subagents gives you the best of both worlds: fast and cheap for simple work, powerful and thorough for complex work.

Agent Teams: The Experimental Frontier

Beyond individual subagents, there's an experimental feature called Agent Teams. This is a step beyond "parent spawns workers" into genuine multi-agent collaboration.

With Agent Teams, you create a named team using TeamCreate, add multiple agents to it, and assign tasks via a shared task list. The key difference from regular subagents: team members can message each other directly. They don't have to route everything through the parent. Agent A can ask Agent B a question, get a response, and continue working.

This enables workflows that regular subagents can't handle well. Consider a scenario where one agent is implementing a backend API and another is building the frontend that consumes it. With regular subagents, they'd either work independently (and hope their assumptions about the interface match) or the parent would have to manually relay information between them. With Agent Teams, the frontend agent can message the backend agent: "What's the response shape for the user endpoint?" and get an immediate answer.

Agent Teams require the CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS feature flag, which I enable in my exact setup. This is experimental for a reason -- the coordination overhead is significant, the messaging patterns can get complex, and it's easy to create circular dependencies where agents are waiting on each other. But for large, multi-component projects where different parts of the system need to stay in sync during implementation, teams are more capable than independent subagents.

I've used teams for a handful of projects. The results are promising but inconsistent. When the task decomposition is clean and the communication patterns are simple, teams produce excellent results fast. When the task decomposition is messy and agents start asking each other clarifying questions in loops, things slow down. I'd recommend starting with regular subagents and only reaching for teams when you hit coordination problems that subagents can't solve.

Practical Tips for Effective Parallelization

After months of using subagents daily, here's what I've learned about getting the most out of them.

Be explicit about what each agent should return. Vague instructions produce vague results. "Research the auth system" is worse than "Read all files in src/auth, identify every public function, and return a summary of what each one does with its parameters and return types." The more specific the deliverable, the more useful the result.

Don't over-parallelize. Spawning 20 agents for 20 tiny tasks has more overhead than benefit. The sweet spot is 3-7 agents for meaningfully sized subtasks. If a task takes less than 10 seconds for a single agent, it's probably not worth the overhead of spawning a subagent for it.

Use Explore agents aggressively for research. Before implementing anything complex, spawn Explore agents to investigate the relevant parts of the codebase. The research results feed into better implementation. This is the "understand before you build" principle, but parallelized.

Worktree isolation is mandatory for multi-agent edits. If more than one agent will modify files, use worktrees. No exceptions. The cost of a worktree (a few seconds of setup) is negligible compared to the cost of a file conflict (lost work, corrupted state, wasted time debugging).

Check for hidden dependencies before parallelizing. The most common parallelization failure is two tasks that looked independent but actually share a dependency -- a config file, a shared utility, a database migration that needs to run before both can proceed. Spend 30 seconds thinking about dependencies before launching parallel agents. It saves minutes of debugging.

Closing

Subagents are the difference between Claude Code as a single-threaded tool and Claude Code as a parallel workforce. For any task where you can identify independent subtasks -- and most non-trivial tasks have them -- spawning workers is strictly better than doing everything sequentially.

The mental model shift is from "I have one very fast assistant" to "I have a team that can split up and work simultaneously." That shift changes what's practical. Tasks that would take too long to be worth doing become fast enough to do routinely. Research that you'd skip because it would eat 15 minutes becomes trivial when three Explore agents handle it in 90 seconds. Multi-file implementations that felt like a slog become something you can knock out during a coffee break.

This is the capability that makes Claude Code feel less like a tool and more like a team. A tool does one thing at a time. A team splits up, tackles the pieces in parallel, and comes back together with the results. Subagents give you the team. For truly autonomous long-running work, you can combine subagents with Ralph Loop for maximum leverage -- parallel workers running inside an autonomous execution loop.