Claude Code Pattern 5: Tool Orchestration and Execution
Introduction
In the previous chapter, we examined the permission system and saw how it evaluates tool use requests before execution. We learned about the multi-mode permission architecture, the evaluation pipeline, security classifiers, and how the system builds trust through transparency. But once a tool use is approved, the harness must actually execute it. This execution is far from trivial — when the model produces multiple tool_use blocks in a single response, the harness must decide which tools can run concurrently, which must run serially, how to handle progress updates, and how to recover from failures. This is the responsibility of the tool orchestration layer.
Tool orchestration is the bridge between model reasoning and real-world action. The model produces tool_use blocks that represent its intentions — read this file, write that file, execute this command, spawn a subagent. The orchestration layer transforms these intentions into actual operations, managing concurrency, propagating state changes, and ensuring that every tool_use has a matching tool_result. Without effective orchestration, even a perfectly designed permission system would be useless — the agent would be safe but inefficient, unable to handle the complexity of real-world multi-step tasks.
This chapter examines the tool orchestration layer in detail, showing how the harness partitions tool calls into batches, executes them efficiently, tracks progress in real time, and handles failures gracefully. We will see how the partitioning algorithm maximizes parallelism while ensuring safety, how the streaming tool executor processes tools as they arrive during streaming, how progress callbacks enable real-time feedback, and how synthetic tool_result blocks maintain invariants when executions are interrupted.
The Tool Orchestration Layer
Once the model produces tool_use blocks, the harness must execute them efficiently and safely. The orchestration layer handles batching, scheduling, context propagation, and progress reporting. The core function is runTools, which is an async generator that yields messages and updated context as tools execute.
The function signature reveals the key responsibilities:
// src/services/tools/toolOrchestration.ts
export async function* runTools(
toolUseMessages: ToolUseBlock[],
assistantMessages: AssistantMessage[],
canUseTool: CanUseToolFn,
toolUseContext: ToolUseContext,
): AsyncGenerator<MessageUpdate, void>
The function receives the tool_use blocks from the model’s response, the assistant messages that contained them, the permission checking function, and the current tool use context. It yields MessageUpdate objects that contain both the message to display and the updated context after each tool execution.
The core implementation follows a clear pattern:
// src/services/tools/toolOrchestration.ts
let currentContext = toolUseContext
for (const { isConcurrencySafe, blocks } of partitionToolCalls(
toolUseMessages,
currentContext,
)) {
if (isConcurrencySafe) {
// Run read-only batch concurrently
const queuedContextModifiers: Record<string, ((context: ToolUseContext) => ToolUseContext)[]> = {}
for await (const update of runToolsConcurrently(
blocks,
assistantMessages,
canUseTool,
currentContext,
)) {
if (update.contextModifier) {
const { toolUseID, modifyContext } = update.contextModifier
if (!queuedContextModifiers[toolUseID]) {
queuedContextModifiers[toolUseID] = []
}
queuedContextModifiers[toolUseID].push(modifyContext)
}
yield { message: update.message, newContext: currentContext }
}
// Apply queued context modifiers after batch completes
for (const block of blocks) {
const modifiers = queuedContextModifiers[block.id]
if (!modifiers) continue
for (const modifier of modifiers) {
currentContext = modifier(currentContext)
}
}
yield { newContext: currentContext }
} else {
// Run non-read-only batch serially
for await (const update of runToolsSerially(
blocks,
assistantMessages,
canUseTool,
currentContext,
)) {
if (update.newContext) {
currentContext = update.newContext
}
yield { message: update.message, newContext: currentContext }
}
}
}
The algorithm iterates through batches produced by the partitioning function. For concurrency-safe batches, it runs tools in parallel and queues context modifiers to apply after the batch completes. For non-safe batches, it runs tools serially and applies context changes immediately. This design maximizes parallelism while maintaining correctness.
Tool Call Partitioning
The partitionToolCalls function groups tool calls into batches based on concurrency safety. It examines each tool_use block, looks up the corresponding tool, and calls isConcurrencySafe to determine whether the tool can run in parallel with others.
The implementation uses a reduce pattern to build batches:
// src/services/tools/toolOrchestration.ts
function partitionToolCalls(
toolUseMessages: ToolUseBlock[],
toolUseContext: ToolUseContext,
): Batch[] {
return toolUseMessages.reduce((acc: Batch[], toolUse) => {
const tool = findToolByName(toolUseContext.options.tools, toolUse.name)
const parsedInput = tool?.inputSchema.safeParse(toolUse.input)
const isConcurrencySafe = parsedInput?.success
? (() => {
try {
return Boolean(tool?.isConcurrencySafe(parsedInput.data))
} catch {
return false
}
})()
: false
if (isConcurrencySafe && acc[acc.length - 1]?.isConcurrencySafe) {
acc[acc.length - 1]!.blocks.push(toolUse)
} else {
acc.push({ isConcurrencySafe, blocks: [toolUse] })
}
return acc
}, [])
}
The function is conservative — if parsing fails or isConcurrencySafe throws, the tool is treated as not concurrency-safe. This ensures that malformed inputs or buggy tool implementations cannot accidentally cause parallel execution of tools that should run serially.
The batching logic produces intuitive results. Consider a sequence of tool calls: Read(file1), Read(file2), Write(file3), Read(file4), Bash(rm). The partitioning produces four batches. The first batch contains the two concurrent read operations. The second batch contains the write operation alone. The third batch contains the fourth read. The fourth batch contains the destructive bash command. This maximizes parallelism — the two reads run together — while ensuring that writes and destructive operations run serially.
Concurrent Tool Execution
Concurrency-safe tools run in parallel with a configurable limit. The getMaxToolUseConcurrency function reads from an environment variable with a default of 10:
// src/services/tools/toolOrchestration.ts
function getMaxToolUseConcurrency(): number {
return parseInt(process.env.CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY || ‘’, 10) || 10
}
The concurrent execution function uses an all generator that runs multiple async generators simultaneously:
// src/services/tools/toolOrchestration.ts
async function* runToolsConcurrently(
toolUseMessages: ToolUseBlock[],
assistantMessages: AssistantMessage[],
canUseTool: CanUseToolFn,
toolUseContext: ToolUseContext,
): AsyncGenerator<MessageUpdateLazy, void> {
yield* all(
toolUseMessages.map(async function* (toolUse) {
toolUseContext.setInProgressToolUseIDs(prev =>
new Set(prev).add(toolUse.id),
)
yield* runToolUse(
toolUse,
assistantMessages.find(_ =>
Array.isArray(_.message.content) && _.message.content.some(
_ => _.type === ‘tool_use’ && _.id === toolUse.id,
),
)!,
canUseTool,
toolUseContext,
)
markToolUseAsComplete(toolUseContext, toolUse.id)
}),
getMaxToolUseConcurrency(),
)
}
The all generator is key to real-time responsiveness. It yields results as they become available, not in input order. This means if the first tool takes 5 seconds but the second takes 1 second, the user sees the second tool’s result after 1 second rather than waiting for both to complete. This out-of-order yielding is essential for a responsive user experience when tools have varying execution times.
The in-progress tracking serves multiple purposes. It allows the UI to show which tools are currently running, enables proper cleanup on abort, and prevents duplicate execution if the same tool_use block is encountered twice.
Serial Tool Execution
Non-concurrency-safe tools run one at a time, with context changes applied immediately after each tool:
// src/services/tools/toolOrchestration.ts
async function* runToolsSerially(
toolUseMessages: ToolUseBlock[],
assistantMessages: AssistantMessage[],
canUseTool: CanUseToolFn,
toolUseContext: ToolUseContext,
): AsyncGenerator<MessageUpdate, void> {
let currentContext = toolUseContext
for (const toolUse of toolUseMessages) {
toolUseContext.setInProgressToolUseIDs(prev =>
new Set(prev).add(toolUse.id),
)
for await (const update of runToolUse(
toolUse,
assistantMessages.find(_ =>
Array.isArray(_.message.content) && _.message.content.some(
_ => _.type === ‘tool_use’ && _.id === toolUse.id,
),
)!,
canUseTool,
currentContext,
)) {
if (update.contextModifier) {
currentContext = update.contextModifier.modifyContext(currentContext)
}
yield {
message: update.message,
newContext: currentContext,
}
}
markToolUseAsComplete(toolUseContext, toolUse.id)
}
}
The key difference from concurrent execution is that context modifiers are applied immediately. This is necessary because serial tools may depend on the state changes from previous tools. For example, if a tool changes the working directory, the next tool should see that change. By applying context modifiers immediately, the harness ensures that each serial tool sees the cumulative effect of all previous tools in the batch.
The Streaming Tool Executor
For real-time tool execution during streaming, the StreamingToolExecutor processes tools as they arrive rather than waiting for the entire response:
// src/services/tools/StreamingToolExecutor.ts
class StreamingToolExecutor {
addTool(toolBlock: ToolUseBlock, assistantMessage: AssistantMessage) {
// Queue tool for execution
}
async *getCompletedResults() {
// Yield results as tools complete
}
async *getRemainingResults() {
// Yield results for all remaining tools (used on abort)
}
discard() {
// Discard pending results (used on fallback)
}
}
The streaming executor is used in the query loop as follows:
// src/query.ts
let streamingToolExecutor = useStreamingToolExecution
? new StreamingToolExecutor(
toolUseContext.options.tools,
canUseTool,
toolUseContext,
)
: null
for (const toolBlock of msgToolUseBlocks) {
streamingToolExecutor.addTool(toolBlock, assistantMessage)
}
for (const result of streamingToolExecutor.getCompletedResults()) {
if (result.message) {
yield result.message
toolResults.push(...)
}
}
for await (const update of streamingToolExecutor.getRemainingResults()) {
if (update.message) {
yield update.message
}
}
The streaming approach provides several benefits. First, it reduces latency — tools start executing as soon as their blocks arrive rather than waiting for the entire response. Second, it enables progress feedback — users see tools executing in real time as the model generates them. Third, it supports interruption — if the user stops the agent mid-response, tools that have already been queued can complete or be cleanly cancelled.
The getRemainingResults method is particularly important for clean shutdown. When the user interrupts or the model falls back, this method generates synthetic tool_result blocks for any tools that were queued but not yet completed. This maintains the invariant that every tool_use has a matching tool_result.


