Claude Code Harness Pattern 9: Observability and Debugging
Introduction
In the previous chapter, we examined memory systems and state persistence and saw how the harness maintains persistent conversation state, records transcripts to disk, caches file reads, attaches relevant memories automatically, deduplicates memory injections, compacts session memory while preserving the memory directory, tracks attribution, creates file history snapshots, preserves segments after compaction, persists agent metadata, and tracks skill discovery. We learned how the QueryEngine serves as the central state container and how transcript recording enables session recovery. But memory is only one aspect of building reliable AI systems. The harness must also provide visibility into its operations through logging, analytics, and debugging tools. This is the responsibility of the observability system.
Observability is critical for building and maintaining production AI agents. Without visibility into what the agent is doing, debugging becomes guesswork. With effective observability, engineers can trace conversations across turns, measure performance at every stage, understand why decisions were made, and quickly identify the root cause of failures. The Claude Code harness implements comprehensive observability through structured logging, query chain tracking, debug logging, error logging, headless profiling, query profiling, multi-turn debugging tools, performance profiling, SDK event queuing, and rich error diagnostics.
This chapter examines the observability system in detail, showing how the harness logs events with structured metadata, tracks conversations across turns with chain IDs, provides debug logging for troubleshooting, captures errors with full context, measures latency at key checkpoints, tracks query-specific timing, enables debugging of multi-turn conversations through the continue sites pattern, profiles performance through token counting and cost tracking, queues events for SDK consumers, and provides rich diagnostic information when errors occur. We will see how the branded analytics metadata type prevents accidental PII leakage, how the query chain ID enables end-to-end tracing, how the headless profiler measures latency, how the continue sites pattern makes multi-turn debugging tractable, and how the error diagnostic information provides actionable context.
Structured Logging for Agent Systems
The harness uses structured logging throughout for observability. Every significant event is logged with rich metadata that enables analysis and debugging:
// src/services/analytics/index.js
logEvent(’tengu_agent_tool_selected’, {
agent_type: selectedAgent.agentType as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
model: resolvedAgentModel as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
source: selectedAgent.source as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
color: selectedAgent.color as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
is_built_in_agent: isBuiltInAgent(selectedAgent),
is_resume: false,
is_async: (run_in_background === true || selectedAgent.background === true) && !isBackgroundTasksDisabled,
is_fork: isForkPath
})
The event name identifies what happened — in this case, an agent was selected for spawning. The metadata fields provide context about the event — the agent type, model, source, color, whether it is a built-in agent, whether it is a resume, whether it runs asynchronously, and whether it uses the fork path. This structured data enables analysis of agent usage patterns, performance characteristics, and user behavior.
Key Events
The harness logs many different event types, each with its own set of metadata fields. The tengu_agent_tool_selected event fires when an agent is spawned, capturing the agent type, model, whether it is async, and whether it uses the fork path. The tengu_auto_compact_succeeded event fires when compaction completes, capturing the original message count, compacted message count, and token counts. The tengu_query_error event fires when the query loop encounters an error, capturing the assistant messages, tool uses, and query chain ID. The tengu_model_fallback_triggered event fires when the system falls back to a different model, capturing the original model and fallback model. The tengu_token_budget_completed event fires when the token budget is exhausted, capturing the percentage, turn tokens, and budget. The permission_decision event fires when a permission is evaluated, capturing the tool name, decision, source, and mode.
Analytics Metadata Type Safety
The codebase uses a branded type to ensure analytics metadata never contains code or filepaths:
// src/services/analytics/index.js
export type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS = string & {
readonly __analyticsMetadata: unique symbol
}
This branded type is a TypeScript pattern that creates a distinct type from string. Values of this type can only be created through explicit casting, which forces the developer to verify that the value does not contain code or filepaths. This prevents accidental PII or code leakage into analytics, which is critical for privacy and security compliance.
Query Chain Tracking
Every query is assigned a chain ID for tracing across turns and subagent spawns:
// src/query.ts
const queryTracking = toolUseContext.queryTracking
? {
chainId: toolUseContext.queryTracking.chainId,
depth: toolUseContext.queryTracking.depth + 1,
}
: {
chainId: deps.uuid(),
depth: 0,
}
const queryChainIdForAnalytics =
queryTracking.chainId as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS
toolUseContext = {
...toolUseContext,
queryTracking,
}
When a query starts, it checks whether there is an existing chain ID in the tool use context. If there is, it increments the depth and uses the same chain ID. If not, it generates a new chain ID and sets the depth to zero. This creates a chain of related queries — the initial user query has depth zero, subagent queries have depth one, sub-subagent queries have depth two, and so on.
The chain ID is included in all analytics events, enabling end-to-end tracing of a conversation from the first user message through all turns and subagent spawns. By querying analytics for a specific chain ID, engineers can reconstruct the full execution path and understand how decisions propagated through the system.
Debug Logging
For development and troubleshooting, the harness includes debug logging:
// src/utils/debug.js
export function logForDebugging(message: string, metadata?: { level?: string }) {
// Logs to debug file / stderr depending on configuration
}
// Usage throughout the codebase
logForDebugging(
`autocompact: tokens=${tokenCount} threshold=${threshold} effectiveWindow=${effectiveWindow}${snipTokensFreed > 0 ? ` snipFreed=${snipTokensFreed}` : ‘’}`,
)
logForDebugging(`Failed to get system prompt for agent ${selectedAgent.agentType}: ${errorMessage(error)}`)
logForDebugging(`Sync agent error: ${errorMessage(error)}`, { level: ‘error’ })
Debug logging is more verbose than analytics logging and includes detailed context that is useful for troubleshooting but too verbose for production analytics. The debug logs are written to a debug file or stderr depending on configuration, allowing engineers to inspect the full execution trace when debugging issues.
Error Logging
Errors are captured with full context:
// src/utils/log.js
export function logError(error: unknown) {
// Structured error logging
}
// Usage
if (innerError instanceof FallbackTriggeredError && fallbackModel) {
logEvent(’tengu_model_fallback_triggered’, {
original_model: innerError.originalModel,
fallback_model: fallbackModel,
entrypoint: ‘cli’,
queryChainId: queryChainIdForAnalytics,
queryDepth: queryTracking.depth,
})
}
Error logging captures not just the error itself but also the context in which it occurred — the original model, the fallback model, the entrypoint, the query chain ID, and the query depth. This context is essential for understanding why the error occurred and how to prevent it in the future.
Ant-Specific Error Logging
Internal builds get additional error logging:
// src/query.ts
// To help track down bugs, log loudly for ants
logAntError(’Query error’, error)
This internal logging is more verbose than the public logging and includes additional context that is useful for internal debugging but not appropriate for external users.
Headless Profiling and Checkpoints
The harness includes a profiling system for measuring latency at key points in the query lifecycle:
// src/utils/headlessProfiler.js
export function headlessProfilerCheckpoint(name: string) {
// Records timestamp for latency analysis
}
// Usage throughout the query lifecycle
headlessProfilerCheckpoint(’before_getSystemPrompt’)
// ... fetch system prompt ...
headlessProfilerCheckpoint(’after_getSystemPrompt’)
headlessProfilerCheckpoint(’before_skills_plugins’)
// ... load skills and plugins ...
headlessProfilerCheckpoint(’after_skills_plugins’)
headlessProfilerCheckpoint(’query_fn_entry’)
headlessProfilerCheckpoint(’query_setup_start’)
headlessProfilerCheckpoint(’query_setup_end’)
headlessProfilerCheckpoint(’query_api_loop_start’)
headlessProfilerCheckpoint(’query_api_streaming_start’)
headlessProfilerCheckpoint(’query_api_streaming_end’)
headlessProfilerCheckpoint(’query_tool_execution_start’)
headlessProfilerCheckpoint(’query_tool_execution_end’)
headlessProfilerCheckpoint(’query_recursive_call’)
The checkpoint names describe the specific phase of execution. By recording timestamps at each checkpoint, the system can measure the duration of each phase — system prompt generation, skill and plugin loading, API streaming, tool execution, and recursive calls. This enables identification of performance bottlenecks and optimization opportunities.
The profiling data is aggregated and reported in analytics, allowing engineers to track performance trends over time and identify regressions.


