Found from Claude Code: Chapter 1: The Harness Paradigm

Apr 02, 2026

∙ Paid

What Is an Agentic AI Harness?

An AI harness is the infrastructure layer that sits between the language model and the external world. Its primary responsibility is to constrain and direct the model’s capabilities toward productive ends while preventing harm. Think of it as the control system for an autonomous agent — it decides what actions the agent can take, monitors the outcomes, and intervenes when things go wrong.

The harness addresses five fundamental challenges that arise when deploying LLMs in production. First, it must constrain the action space. A raw language model can output any text, but an agent in production needs to perform specific, well-defined actions. The harness provides tools with strict input schemas that channel the model’s intentions into safe, structured operations.

Second, the harness must manage conversation state. Real-world tasks often require many turns of back-and-forth between the user and the agent. The harness tracks this conversation history, manages the finite context window, and ensures that the agent has the information it needs at each step.

Third, the harness enforces safety through permissions. Before the agent executes any action, the harness evaluates whether that action should be allowed. This evaluation considers the user’s explicit permissions, automated safety classifiers, and the current operational context.

Fourth, the harness handles failures gracefully. Network calls fail, APIs return errors, and agents sometimes produce unexpected output. The harness must detect these failures, attempt recovery when possible, and surface clear error messages when recovery is not possible.

Fifth, the harness optimizes resource usage. LLM API calls cost money and consume tokens from a limited context window. The harness tracks costs, manages token budgets, and compacts conversation history when necessary to stay within constraints.

The Core Insight

The fundamental insight of harness engineering is that the challenge of production AI is not primarily about the model itself. The model provides intelligence, but the harness provides control. A well-designed harness can make even a modest model reliable and safe, while a poorly designed harness can make even the most capable model dangerous and unpredictable.

This insight shifts the focus of AI engineering from prompt crafting and model selection to system design and infrastructure. The question is not “how do we make the model produce better output?” but rather “how do we build a system around the model that produces good outcomes?”

Consider a simple example. A raw language model might be asked to “delete all files in the current directory.” Without a harness, the model might generate text that looks like a shell command, and a naive system might execute it directly. With a harness, the request is intercepted. The harness checks whether the user has permission to perform destructive operations, whether the command matches any safety rules, and whether there are alternative approaches that achieve the same goal without destruction. The harness might surface this decision to the user for approval, or it might automatically redirect the agent toward a safer approach.

The harness does not make the model smarter about file deletion. Instead, it creates a system where the consequences of the model’s output are controlled and safe. This is the essence of harness engineering.

The QueryEngine as Harness Core

At the heart of the Claude Code harness is the QueryEngine class. This class embodies the core responsibilities of the harness — managing conversation state, orchestrating tool execution, tracking costs, and enforcing safety. Examining its structure reveals the essential components of any AI harness.

The QueryEngine maintains several pieces of mutable state that persist across turns:

// src/QueryEngine.ts

export class QueryEngine {

private config: QueryEngineConfig

private mutableMessages: Message[]

private abortController: AbortController

private permissionDenials: SDKPermissionDenial[]

private totalUsage: NonNullableUsage

The mutableMessages array holds the complete conversation history. Every user message, assistant response, and tool result is appended to this array. This is the harness’s memory — it remembers everything that has happened in the conversation so far.

The abortController provides a cancellation mechanism. If the user decides to stop the agent mid-execution, the harness can signal through this controller to halt ongoing operations. This is a critical safety feature — the user must always be able to regain control.

The permissionDenials array records every time the harness blocked a tool use. This serves both as an audit trail and as feedback for improving the permission system. In production systems, every denial should be examined to understand whether it was correct or whether the permission rules need adjustment.

The totalUsage field tracks cumulative token consumption and costs. The harness must know how many tokens have been used and how much the session has cost so it can enforce budgets and warn users before they exceed their limits.

The QueryEngine exposes its functionality through a single method that handles the entire conversation lifecycle:

// src/QueryEngine.ts

async *submitMessage(

prompt: string | ContentBlockParam[],

options?: { uuid?: string; isMeta?: boolean },

): AsyncGenerator<SDKMessage, void, unknown>

This method is an async generator, which means it yields results incrementally rather than blocking until the entire conversation turn is complete. This design enables real-time streaming of the agent’s responses — the user sees text appear as it is generated, and tool executions are reported as they complete. The async generator pattern is fundamental to building responsive AI systems.

Continue reading this post for free, courtesy of Ken Huang.

Or purchase a paid subscription.