Claude Code Harness Pattern 8: Memory Systems and State Persistence
Introduction
In the previous chapter, we examined multi-agent coordination and saw how the harness spawns and manages agents through the AgentTool, routes between different execution modes, isolates agents through worktrees, inherits system prompts for efficiency, coordinates teams of agents, and enables inter-agent communication. We learned how the harness manages the full lifecycle of spawned agents — allocating resources, tracking progress, handling errors, and cleaning up on completion. But multi-agent coordination is only one aspect of building persistent AI systems. Agents must also maintain memory across sessions, retain knowledge from past interactions, and provide attribution for learned information. This is the responsibility of the memory system.
Memory and state persistence are fundamental to creating AI agents that feel continuous and coherent. Without memory, each conversation would start from scratch, with no knowledge of previous interactions. Users would need to re-explain their preferences, project context, and ongoing work every time they interact with the agent. With effective memory systems, agents can build on past conversations, remember user preferences, and maintain context across sessions that span days or weeks.
This chapter examines the memory system in detail, showing how the harness maintains persistent conversation state, records transcripts to disk, caches file reads, attaches relevant memories automatically, deduplicates memory injections, compacts session memory while preserving the memory directory, tracks attribution, creates file history snapshots, preserves segments after compaction, persists agent metadata, and tracks skill discovery. We will see how the QueryEngine serves as the central state container, how transcript recording enables session recovery, how the file state cache avoids redundant reads, how memory attachments are prefetched and deduplicated, and how preserved segments ensure resume works correctly after compaction.
Session Memory and Conversation History
The harness maintains persistent conversation state across turns and sessions. The QueryEngine is the primary state container, holding the complete conversation history and related state:
// src/QueryEngine.ts
export class QueryEngine {
private config: QueryEngineConfig
private mutableMessages: Message[] // Full conversation history
private abortController: AbortController
private permissionDenials: SDKPermissionDenial[]
private totalUsage: NonNullableUsage // Cumulative token usage
private hasHandledOrphanedPermission = false
private discoveredSkillNames = new Set<string>()
private loadedNestedMemoryPaths = new Set<string>()
constructor(config: QueryEngineConfig) {
this.config = config
this.mutableMessages = config.initialMessages ?? []
this.abortController = config.abortController ?? createAbortController()
this.permissionDenials = []
this.totalUsage = EMPTY_USAGE
}
}
The mutableMessages array holds the complete conversation history. Every user message, assistant response, tool result, and system message is appended to this array. This is the harness’s memory — it remembers everything that has happened in the conversation so far. The array is mutable because messages are added incrementally as the conversation progresses.
The abortController provides a cancellation mechanism. If the user decides to stop the agent mid-execution, the harness can signal through this controller to halt ongoing operations. This is a critical safety feature — the user must always be able to regain control.
The permissionDenials array records every time the harness blocked a tool use. This serves both as an audit trail and as feedback for improving the permission system. When the session ends, these denials are included in the result message, allowing SDK consumers to understand what actions were blocked and why.
The totalUsage field tracks cumulative token consumption. The harness must know how many tokens have been used so it can enforce budgets and warn users before they exceed their limits. This usage is accumulated across all turns in the session.
The discoveredSkillNames set tracks which skills were surfaced during the session. This is used for telemetry to understand which skills are being discovered and used.
The loadedNestedMemoryPaths set tracks which CLAUDE.md paths have been loaded to avoid re-injection. Without this tracking, the same CLAUDE.md file could be injected dozens of times in a busy session as the file state cache evicts entries.
Cross-Turn State
State persists across turns within a session. Each call to submitMessage starts a new turn, but the state — messages, file cache, usage, permission denials — carries over from previous turns:
// Each submitMessage() call starts a new turn within the same conversation
// State (messages, file cache, usage, etc.) persists across turns
async *submitMessage(prompt, options?): AsyncGenerator<SDKMessage, void, unknown> {
// mutableMessages accumulates across turns
// totalUsage accumulates across turns
// permissionDenials accumulate across turns
}
This persistence is what makes multi-turn conversations possible. The agent remembers what happened in previous turns and can reference that history when responding to new prompts.
Session Persistence via Transcript Recording
Conversations are persisted to disk for recovery after crashes or restarts. The recordTranscript function writes the current conversation state to a file that can be loaded later:
// src/QueryEngine.ts
const persistSession = !isSessionPersistenceDisabled()
// Pre-emptive persistence — before entering the query loop
// Ensures transcript is resumable even if killed before API responds
if (persistSession && messagesFromUserInput.length > 0) {
const transcriptPromise = recordTranscript(messages)
if (isBareMode()) {
void transcriptPromise // Fire-and-forget
} else {
await transcriptPromise // Block for durability
if (isEnvTruthy(process.env.CLAUDE_CODE_EAGER_FLUSH) || isEnvTruthy(process.env.CLAUDE_CODE_IS_COWORK)) {
await flushSessionStorage() // Force flush
}
}
}
The pre-emptive persistence is critical for resume functionality. By persisting before entering the query loop, the system ensures that the user’s message is recorded even if the process is killed before the API responds. Without this, a kill between user message acceptance and API response would leave the transcript with only queue-operation entries, making the --resume command fail with “No conversation found”.
Persistence Strategies by Mode
The persistence strategy varies based on the deployment mode. In bare mode, transcript recording is fire-and-forget — speed is prioritized over durability. In interactive and cowork modes, the system blocks on recording and optionally flushes to disk immediately, prioritizing durability over speed. This allows different deployment scenarios to choose their own trade-offs between responsiveness and data safety.
The SDK mode makes persistence configurable, allowing the caller to decide the appropriate strategy for their use case.


