Chapter 10: Production Deployment Patterns (Claude Code vs. Hermes Agent)
1. Pattern Summary
Production deployment is where the harness meets the real world: multiple users, multiple environments, scheduled automation, and the need to embed the agent into existing infrastructure. A prototype that works on a developer's laptop needs to survive API outages, multi-tenant isolation, cost overruns, and compliance audits before it earns the label "production." Both Claude Code and Hermes Agent have thought hard about this problem — but from opposite directions. Claude Code is SDK-first: it exposes an async generator interface designed to be embedded inside other applications. Hermes is CLI/gateway-first: it ships as a standalone agent that users interact with directly, across messaging platforms, on a schedule, or from the command line. Understanding both deployment philosophies lets you pick the right tool for each layer of your stack — and combine them when the job demands it.
2. Claude Code Implementation
SDK Async Generator Interface
The core of Claude Code's production story is the ask() convenience wrapper and the QueryEngine.submitMessage() method underneath it. Both return async generators, which means SDK consumers receive events as they happen rather than waiting for a full response:
// src/QueryEngine.ts — the low-level SDK surface
export class QueryEngine {
async *submitMessage(
prompt: string | ContentBlockParam[],
options?: { uuid?: string; isMeta?: boolean },
): AsyncGenerator<SDKMessage, void, unknown> {
// Yields one SDKMessage per event: assistant text, tool use,
// tool result, compaction boundary, retry notice, final result.
// Consumers iterate with `for await (const msg of engine.submitMessage(...))`
}
}// src/QueryEngine.ts — the convenience wrapper for one-shot usage
export async function* ask({
prompt,
tools,
mcpClients,
canUseTool,
mutableMessages = [],
maxTurns,
maxBudgetUsd,
// ... 20+ more options
}): AsyncGenerator<SDKMessage, void, unknown> {
const engine = new QueryEngine({ /* all options forwarded */ })
try {
yield* engine.submitMessage(prompt, { uuid: promptUuid })
} finally {
// Propagate file cache back to caller even on error
setReadFileCache(engine.getReadFileState())
}
}The SDKMessage union covers every event type a consumer needs: assistant text, tool use summaries, compaction boundaries, retry notices, and final results. A SIEM integration, a web UI, or a CI pipeline can all consume the same stream and filter for the events they care about.
30 Compile-Time Feature Flags
Claude Code ships with 30 feature flags that control experimental capabilities. In the external build, every flag returns false — dead code is eliminated at bundle time by the bundler:
// src/entrypoints/cli.tsx — polyfill for external builds
function feature(name: string): boolean {
return false // All flags off; bundler tree-shakes the dead branches
}
// Usage pattern: conditional import eliminates the module from the bundle
const reactiveCompact = feature('REACTIVE_COMPACT')
? require('./services/compact/reactiveCompact.js')
: null
// Usage pattern: inline gate for experimental code paths
if (feature('CONTEXT_COLLAPSE') && contextCollapse) {
messagesForQuery = await contextCollapse.applyCollapsesIfNeeded(...)
}The 30 flags span six categories: autonomous agents (KAIROS, COORDINATOR_MODE, BUDDY), remote/distributed (BRIDGE_MODE, DAEMON, SSH_REMOTE), communication (UDS_INBOX), enhanced tools (WEB_BROWSER_TOOL, VOICE_MODE, MCP_SKILLS), conversation management (HISTORY_SNIP, ULTRAPLAN), and infrastructure (HARD_FAIL, TRANSCRIPT_CLASSIFIER, TORCH). Internal Anthropic builds replace the polyfill with a real feature flag service, enabling gradual rollouts and A/B testing without redeployment.
Multi-Provider Support
The harness abstracts over four cloud providers through a unified API client layer:
Model selection at runtime considers permission mode, user-specified overrides, and whether the context exceeds 200K tokens — the harness can automatically switch to a larger context model mid-session.
Plugin System and MCP Integration
// src/utils/plugins/pluginLoader.js — load plugins from cache, no network call
export async function loadAllPluginsCacheOnly(): Promise<{ enabled: Plugin[] }> {
// Scans plugin directories, validates manifests, returns enabled plugins.
// Plugins inject custom tools into the system prompt at session start.
}MCP (Model Context Protocol) integration spans 24 files and 12K+ lines. MCP tools are namespaced as mcp__server__tool so the harness can route calls to the right server:
// Extract server name from a namespaced MCP tool call
if (tool.name?.startsWith('mcp__')) {
const parts = tool.name.split('__')
const serverName = parts[1] // e.g. "mcp__filesystem__read_file" → "filesystem"
}Deployment Checklist (from chapter10.md)
The Claude Code deployment checklist covers five areas: Infrastructure (secrets manager, rate limiting, multi-region, monitoring); Security (permission modes, tool allow/deny lists, sandbox isolation, PII filtering, audit logging); Reliability (circuit breakers, retry with backoff, graceful degradation, budget limits); Observability (structured logging with correlation IDs, cost tracking, health checks); Performance (token counting, prompt caching, concurrent tool limits).
3. Hermes Agent Implementation
CLI + Gateway Dual Entry Points
Hermes ships two production entry points. The hermes CLI is for interactive use and scripted automation. The gateway is for always-on messaging platform integration:
# hermes-agent/hermes_cli/main.py — the full CLI surface
# hermes → interactive chat
# hermes gateway start → start gateway as a background service
# hermes cron → manage scheduled jobs
# hermes doctor → check config and dependencies
# hermes model → switch models at runtime
# hermes sessions browse → interactive session picker
# hermes acp → run as ACP server for editor integration# hermes-agent/gateway/run.py — messaging gateway entry point
# Manages platform adapters for: Telegram, Discord, Slack, WhatsApp,
# Signal, Matrix, Mattermost, DingTalk, Feishu, WeCom, SMS, Email, Webhook.
# Each platform runs as an async adapter; the gateway multiplexes them
# all under a single process with a shared agent backend.
async def start_gateway():
"""Start all configured platform adapters."""
# Handles SSL cert auto-detection for NixOS and non-standard systems
_ensure_ssl_certs()
# Loads platform configs, starts adapters, runs event loopProfile System: HERMES_HOME Isolation
Multi-tenant isolation in Hermes is built on the HERMES_HOME environment variable. Each profile gets its own directory with its own config, keys, sessions, and memories:
# hermes-agent/hermes_constants.py — single source of truth for home dir
def get_hermes_home() -> Path:
"""Return the Hermes home directory (default: ~/.hermes).
Override with HERMES_HOME to isolate profiles:
HERMES_HOME=~/.hermes/profiles/analyst1 hermes chat
HERMES_HOME=~/.hermes/profiles/analyst2 hermes chat
Each profile has its own config.yaml, .env, sessions/, memories/, cron/.
"""
return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
def display_hermes_home() -> str:
"""User-friendly display: ~/.hermes/profiles/analyst1 → ~/...analyst1"""
home = get_hermes_home()
try:
return "~/" + str(home.relative_to(Path.home()))
except ValueError:
return str(home)The ensure_hermes_home() function in config.py creates the full directory structure with secure permissions (0700 for dirs, 0600 for files) on first run — so spinning up a new tenant profile is a single mkdir + env var.
Runtime Config: DEFAULT_CONFIG and config.yaml
Hermes uses a layered config system. DEFAULT_CONFIG in config.py defines every settable value with sensible defaults; ~/.hermes/config.yaml overrides them at runtime without redeployment:
# hermes-agent/hermes_cli/config.py — DEFAULT_CONFIG (selected fields)
DEFAULT_CONFIG = {
"model": "", # Empty = use provider default
"providers": {}, # Per-provider API key overrides
"fallback_providers": [], # Ordered fallback chain on API error
"toolsets": ["hermes-cli"], # Active toolset groups
"agent": {
"max_turns": 90, # Hard iteration cap per conversation
"gateway_timeout": 1800, # Inactivity timeout for gateway sessions (seconds)
"tool_use_enforcement": "auto", # Force tool calls for gpt/codex models
},
"compression": {
"enabled": True,
"threshold": 0.50, # Compress when context > 50% full
"target_ratio": 0.20, # Keep 20% of threshold as recent tail
"protect_last_n": 20, # Always keep last 20 messages verbatim
},
"terminal": {
"backend": "local", # local | ssh | docker | modal | daytona
"timeout": 180,
"docker_image": "nikolaik/python-nodejs:python3.11-nodejs20",
"container_memory": 5120, # MB
},
# ... auxiliary model config, browser config, checkpoint config
}Optional env vars (defined in _EXTRA_ENV_KEYS) cover platform tokens, SSH keys, and provider credentials — all stored in ~/.hermes/.env with 0600 permissions.
200+ Models via OpenRouter
Hermes routes through OpenRouter by default, giving access to 200+ models from a single API key. Model switching is a runtime operation — no redeployment needed:
# Switch models at runtime — updates config.yaml immediately
hermes model
# Or set per-session via env var
HERMES_MODEL=anthropic/claude-opus-4-5 hermes chat
# Or per-cron-job in the job definition
# { "model": "google/gemini-2.5-pro", "schedule": "0 2 * * *", ... }The smart_model_routing config block can automatically route simple queries to a cheap model and complex ones to a capable model — cost optimization without code changes.
Built-In Cron Scheduling
Hermes ships a production-grade cron scheduler. Jobs are defined in ~/.hermes/cron/ and executed by the gateway's background thread every 60 seconds:
# hermes-agent/cron/scheduler.py — tick() runs due jobs
# Uses a file-based lock (~/.hermes/cron/.tick.lock) so only one tick
# runs at a time even if gateway + daemon + systemd timer overlap.
def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
"""Execute a single cron job via AIAgent.run_conversation().
- Loads config.yaml fresh every run (no restart needed for key rotation)
- Injects HERMES_CRON_AUTO_DELIVER_* env vars for platform delivery
- Enforces inactivity timeout (default 600s) via ThreadPoolExecutor
- Saves output to ~/.hermes/cron/sessions/ for audit
- Delivers result to origin chat, specific platform, or local-only
"""
agent = AIAgent(
model=turn_route["model"],
quiet_mode=True,
skip_memory=True, # Cron prompts shouldn't corrupt user memory
platform="cron",
session_id=_cron_session_id,
session_db=_session_db, # SQLite persistence for post-run search
)
# Inactivity-based timeout: job can run for hours if actively calling tools,
# but a hung API call with no activity for 600s is killed and logged.
result = _run_with_inactivity_timeout(agent, prompt, limit=600)The [SILENT] sentinel lets a cron job suppress delivery when there's nothing new to report — the output is still saved locally for audit, but the user's chat isn't spammed.
Skills Hub
Hermes integrates with agentskills.io, an open standard for shareable agent skills. Skills are reusable prompt workflows that can be invoked from cron jobs, chat sessions, or the gateway:
# Skills can be loaded into cron jobs by name
job = {
"name": "nightly-threat-hunt",
"schedule": "0 2 * * *",
"skills": ["threat-hunt-v2"], # Loaded from agentskills.io or ~/.hermes/skills/
"prompt": "Focus on lateral movement indicators from the last 24h.",
"deliver": "origin",
}4. Side-by-Side Comparison Table
5. When to Use Which
Use Claude Code when:




