OpenClaw Threat Model: MAESTRO Framework Analysis
This document applies MAESTRO Framework (7-layer Agentic AI Threat Model) to the OpenClaw codebase, identifying specific threats at each layer and high level mitigation strategies. Please comment below if we missed anything.
Layer 1 – Foundation Models
Threats Identified in OpenClaw
LM-001: Adversarial Prompt Injection via Messaging Channels (Critical, src/channels/)
Threat: Attackers send crafted messages through WhatsApp, Telegram, or Discord designed to manipulate the model into producing harmful outputs. These messages may contain hidden instructions, base64-encoded payloads, or context manipulation techniques that override system prompts. The attack exploits the model’s inability to distinguish between user instructions and malicious injections, potentially leading to unauthorized tool access, data disclosure, or harmful actions.
Mitigation: Input sanitization in src/agents/pi-embedded-helpers.ts strips thought signatures and sanitizes tool calls. Model failover in src/agents/model-fallback.ts automatically switches models when abuse is detected. Context window guards in src/agents/context-window-guard.ts prevent context overflow attacks. The /think command limits reasoning depth to reduce exploitation surface. Session compaction in src/agents/compaction.ts summarizes and resets context periodically to prevent long-term manipulation.
LM-002: Jailbreak via Multi-turn Context (High, src/sessions/)
Threat: Persistent sessions allow attackers to gradually manipulate context over multiple messages. By building trust over time and introducing subtle premise shifts, attackers can eventually bypass safety guards that would catch direct jailbreak attempts. The conversational nature of these attacks makes them harder to detect than single-turn prompt injection.
Mitigation: Session compaction in src/agents/compaction.ts periodically summarizes conversations and removes older context, limiting the window for gradual manipulation. Context window guards monitor conversation length and trigger resets before suspicious patterns can develop. The model failover system detects anomalous behavior patterns and switches to safer model configurations.
LM-003: Model Provider API Key Exposure (Critical, src/config/, ~/.openclaw/openclaw.json)
Threat: API keys for model providers are stored in configuration files that may have overly permissive filesystem permissions. If an attacker gains access to the filesystem, they could extract credentials and incur costs or access the provider’s full API capabilities under the user’s identity.
Mitigation: The config validation system in src/config/config.ts rejects configurations with world-readable permissions. The security audit in src/security/audit.ts actively checks for and reports world-readable config files. Credentials should be stored using OS keychain integration where available, and the audit command provides warnings about permission issues.
LM-004: Prompt Injection via File Uploads (High, src/media/, src/media-understanding/)
Threat: Malicious documents or images uploaded through messaging channels are processed by the model for understanding. Attackers can embed hidden instructions in image metadata, steganographic content, or document structures that trigger when the model analyzes the file, potentially bypassing text-based input filters.
Mitigation: Media processing pipelines in src/media/ and src/media-understanding/ sanitize file content before feeding it to the model. Image metadata is stripped during preprocessing, and document parsing isolates content from structural elements that could contain injection vectors. Input validation occurs at multiple stages of the media pipeline.
LM-005: System Prompt Leakage (Medium, src/agents/bootstrap-files.ts)
Threat: AGENTS.md, SOUL.md, and TOOLS.md files that define system behavior may leak through model outputs. If users can extract these prompts, they understand the agent’s full capabilities, constraints, and internal workings, enabling more sophisticated attacks.
Mitigation: Output filtering in src/agents/pi-embedded-helpers.ts prevents known system prompt patterns from appearing in responses. The bootstrap system in src/agents/bootstrap-files.ts uses dynamic prompt construction that varies between sessions, making pattern detection harder. Monitoring for prompt extraction patterns triggers alerts when suspicious query patterns are detected.
Layer 2 – Data Operations
Threats Identified in OpenClaw
DO-001: Credential Storage in Plaintext (Critical, src/pairing/pairing-store.ts, ~/.openclaw/credentials/)
Threat: OAuth tokens, API keys, and pairing credentials are stored in JSON files without encryption at rest. An attacker with filesystem access could read all stored credentials, enabling account takeover, unauthorized messaging access, and data theft across all configured channels.
Mitigation: The pairing store uses short-lived codes with expiration (1 hour TTL) and limits pending requests to 3 per channel. Credentials should be stored using OS keychain integration. The security audit flags world-readable credential directories. Pairing codes use an alphabet that excludes ambiguous characters to prevent social engineering attacks.
DO-002: World-Readable State Directory (Critical, src/security/audit.ts findings)
Threat: The default state directory (~/.openclaw/) may have permissions that allow other users on the same system to read session data, conversation logs, and agent state. This exposes sensitive conversational data and potential escalation paths if the agent has elevated permissions.
Mitigation: The security audit in src/security/audit.ts checks for world-writable and world-readable state directories, reporting findings with severity levels. The system recommends permission mode 0o700 for the state directory. Installation scripts should set appropriate permissions during initial setup.
DO-003: Message History Leakage (High, src/config/sessions.ts)
Threat: Session logs stored on disk contain full conversation histories with all user messages and agent responses. These logs persist indefinitely unless explicitly rotated, creating a large attack surface for information disclosure.
Mitigation: Session configuration supports automatic compaction and rotation. Logs can be configured to redact sensitive patterns. The security audit flags configurations that retain excessive history. Users should configure log retention policies based on their security requirements.
DO-004: Skill Code Injection (Critical, src/security/skill-scanner.ts)
Threat: Skills loaded from ~/.openclaw/workspace/skills/ execute arbitrary code with the agent’s permissions. A malicious skill could exfiltrate data, execute system commands, access credentials, or perform other harmful actions. This is particularly dangerous because skills have full access to the agent’s capabilities.
Mitigation: The skill scanner in src/security/skill-scanner.ts implements line-by-line rule checking for dangerous patterns. It detects dangerous exec functions (exec, execSync, spawn, spawnSync) requiring child_process context, environment harvesting patterns (process.env) with network context, file system operations, and network requests. Skills are scanned before loading, and dangerous patterns trigger warnings or blocking based on severity. Skill signing should be implemented for production deployments.
DO-005: Vector Store Poisoning (High, extensions/memory-lancedb/, src/memory/)
Threat: The memory system using LanceDB stores conversational embeddings that inform future responses. Attackers could inject malicious content that becomes embedded in the vector store, causing the model to reference poisoned memories in future conversations, potentially spreading misinformation or escalating attacks.
Mitigation: Memory isolation can segment vector stores by agent identity. Content validation before embedding creation can filter obviously malicious patterns. Memory retrieval includes similarity thresholding to avoid recall of marginally relevant but potentially poisoned entries. Regular memory audits can identify suspicious patterns.
DO-006: Browser Profile Data Leakage (Medium, src/browser/)
Threat: Chrome browser profiles used for web automation contain cookies, session data, browsing history, and cached credentials. If the browser sandbox is compromised, this data could be extracted and used for account takeover on sites the user has authenticated with.
Mitigation: Browser automation runs in isolated contexts with minimal permissions. Profile data is cleared between sessions when configured. The browser subsystem is isolated from the main agent process. Network requests are filtered through a proxy that can block sensitive endpoints.
Layer 3 – Agent Frameworks
Threats Identified in OpenClaw
AF-001: Tool Misuse via Prompt Injection (Critical, src/agents/bash-tools.ts, src/browser/)
Threat: Attackers craft messages that trigger bash or browser tools to execute unauthorized commands. A successful injection could delete files, exfiltrate data, install malware, or perform any action the agent’s tools permit. This is the most direct path from conversation to system compromise.
Mitigation: All tool calls are logged with full parameters for audit. Tool usage patterns are monitored for anomalies. The sandbox configuration in src/config/config.ts includes allowlists and denylists for tool access. Elevated commands require explicit approval. Bash commands run through restricted shells where possible.
AF-002: Session Spawning Abuse (High, src/agents/openclaw-tools.ts)
Threat: The sessions_spawn tool creates sub-agents with potentially different permissions. Attackers could spawn unauthorized sessions, use them as covert channels to bypass controls, or escalate privileges by creating sessions with elevated access.
Mitigation: Session spawning requires explicit allowlist configuration. Sub-agents inherit restricted tool sets by default. Spawning is logged with full context for audit. The security audit flags configurations that allow spawning without proper controls. Human-in-the-loop approval can be required for session creation.
AF-003: Cross-Session Data Leakage (High, src/config/config.ts - session.dmScope)
Threat: Multi-user direct messages share session context if dmScope is misconfigured. Users in group DMs could see messages from other users, or the agent could leak information from one conversation to another, violating confidentiality.
Mitigation: The configuration validates dmScope settings and warns about configurations that could leak data. Session isolation is the default behavior. The security audit flags misconfigurations. Access groups enforce explicit authorization before including users in sensitive contexts.
AF-004: Elevated Mode Exploitation (Critical, src/cli/exec-approvals-cli.ts)
Threat: The /elevated on command grants privileged access to bash commands and other restricted tools. Without sufficient controls, this could allow attackers who have achieved any level of access to gain full system control through the elevated session.
Mitigation: Elevated mode requires explicit configuration with allowlists. The security audit in src/security/audit.ts flags wildcard allowlists in elevated mode. Elevated commands require human approval where configured. Elevated sessions have timeout limits. All elevated commands are logged with full audit trails.
AF-005: Sandbox Escape (High, src/config/config.sandbox-docker.test.ts)
Threat: Docker sandboxing intended to isolate agent operations may be bypassed through volume mounts, container escape vulnerabilities, or misconfigured security options. A successful escape gives the attacker access to the host system.
Mitigation: Docker sandbox configuration in src/config/config.sandbox-docker.test.ts validates security options. Containers run with minimal privileges (no --privileged, no host network). Volume mounts are restricted to specific paths. Security profiles (seccomp, AppArmor) limit system call access. The security audit checks for dangerous configurations.
AF-006: Agent-to-Agent Communication Abuse (Medium, src/agents/openclaw-tools.ts)
Threat: The sessions_send tool enables communication between agents, which could be used as covert channels to exfiltrate data or coordinate attacks across isolated sessions. Messages between agents bypass normal authorization controls.
Mitigation: Inter-agent communication is logged and monitored. The security audit flags configurations that enable cross-session communication without controls. Communication patterns are monitored for anomalies. Access groups can restrict which agents can communicate.
Layer 4 – Deployment & Infrastructure
Threats Identified in OpenClaw
DI-001: Gateway Binding Exposure (Critical, src/security/audit.ts)
Threat: The gateway service binds to network interfaces other than loopback without authentication when gateway.bind is not set to “loopback”. This exposes the control plane to the network, allowing anyone who can reach the host to send commands to the agent.
Mitigation: The default binding is 127.0.0.1:18789 (loopback only). The security audit detects and reports configurations that bind to other interfaces. Authentication is required for non-loopback bindings. The audit flags configurations without shared secrets when binding beyond loopback.
DI-002: Tailscale Funnel Misconfiguration (Critical, src/gateway/gateway-cli.ts)
Threat: Tailscale Funnel mode exposes the gateway to the public internet without additional authentication. This creates a direct attack surface for anyone on the internet to attempt gateway access.
Mitigation: The security audit flags Tailscale Funnel configurations. Funnel mode requires explicit user configuration and consent. Authentication is enforced on funnel connections. The audit warns about funnel configurations without proper access controls.
DI-003: Reverse Proxy Header Spoofing (High, src/security/audit.ts)
Threat: Missing or misconfigured trustedProxies settings allow attackers to spoof IP addresses and other HTTP headers when the gateway sits behind a reverse proxy. This could bypass IP-based rate limiting, access controls, and audit logging.
Mitigation: The security audit checks proxy configuration for proper trusted proxy settings. Header validation rejects obviously spoofed values. Access logging includes source IP verification.
DI-004: WebSocket Token Brute Force (Medium, src/gateway/auth.ts)
Threat: Gateway authentication tokens may be short or generated with weak randomness, allowing attackers to brute-force tokens and gain unauthorized access to agent control.
Mitigation: Token generation uses cryptographically secure random number generators. Token length meets security thresholds. Failed authentication attempts trigger rate limiting and lockouts. The security audit flags weak token configurations.
DI-005: Node.js Runtime Vulnerabilities (Medium, package.json)
Threat: Node.js dependencies may have known security vulnerabilities (CVEs) that could be exploited. Attackers could use these vulnerabilities to escape isolation, escalate privileges, or exfiltrate data.
Mitigation: Regular dependency audits identify known vulnerabilities. Security updates are tracked and applied. The security audit includes dependency vulnerability checks. Minimal dependency sets reduce attack surface.
DI-006: Docker Socket Exposure (High, src/config/config.sandbox-docker.test.ts)
Threat: Sandbox containers configured with Docker socket access (/var/run/docker.sock) can escape isolation and control the host’s Docker daemon, enabling container breakout, image manipulation, and cluster compromise.
Mitigation: The security audit flags Docker socket mounts in sandbox configurations. Containers should never receive socket access. Alternative isolation mechanisms are preferred. Configuration validation rejects known dangerous patterns.
Layer 5 – Evaluation & Observability
Threats Identified in OpenClaw
EO-001: Insufficient Logging (High, src/security/audit.ts)
Threat: When logging.redactSensitive is set to false, audit logs may contain secrets, credentials, and sensitive user data. These logs could be accessed by unauthorized parties or exposed in log aggregation systems.
Mitigation: Sensitive data redaction is enabled by default. The security audit flags configurations with redaction disabled. Log storage should be restricted to authorized personnel. Log aggregation systems should have their own access controls.
EO-002: No Runtime Behavior Anomaly Detection (Medium, N/A - Gap Identified)
Threat: Without behavioral analytics, unusual tool usage patterns, prompt injection attempts, or data exfiltration go undetected. The system relies entirely on prevention rather than detection of novel attacks.
Mitigation: This is a gap that should be addressed with behavioral baseline establishment and anomaly alerting. Tool call patterns should be logged and analyzed. Sudden changes in behavior should trigger alerts. Machine learning models can identify novel attack patterns.
EO-003: Missing Safety Evaluations (Medium, N/A - Gap Identified)
Threat: Without regular red-teaming and safety evaluations, the system may have undetected vulnerabilities. New features or model updates could introduce security issues that go unnoticed until exploited.
Mitigation: An automated adversarial testing pipeline should be implemented. Regular penetration testing should cover all MAESTRO layers. Model outputs should be evaluated against safety benchmarks. Red-team findings should be tracked and remediated.
EO-004: Audit Log Tampering (Medium, src/logging.ts)
Threat: Logs stored locally without integrity protection can be modified by attackers to cover their tracks. This undermines forensic analysis and incident response.
Mitigation: JSON-formatted audit logs with integrity checksums should be implemented. Write-once storage or append-only logs prevent modification. Remote log shipping preserves integrity. Cryptographic signing of log entries provides non-repudiation.
EO-005: Gateway Probe Failure Blindness (Low, src/security/audit.ts)
Threat: Deep security audits that probe the live gateway may fail silently, giving false confidence that the gateway is secure when it couldn’t be reached or tested.
Mitigation: The security audit structure captures probe attempts, URLs, errors, and success states. Failed probes are reported with error details. Retry logic improves reliability. Out-of-band verification is recommended for critical deployments.
Layer 6 – Security & Compliance
Threats Identified in OpenClaw
SC-001: DM Policy Misconfiguration (Critical, All Channel Monitors)
Threat: Setting dmPolicy=”open” allows anyone to send direct messages to the agent without approval. Attackers can flood the agent with malicious messages, use it for spam/phishing, or probe for vulnerabilities at scale.
Mitigation: The default dmPolicy is “pairing” which requires explicit approval. The security audit flags open DM policies. Channel monitors validate DM settings. Pairing-based access provides human oversight before anyone can interact with the agent.
SC-002: Group Chat Unauthorized Access (High, src/config/group-policy.ts)
Threat: Group chat policies that don’t properly restrict who can mention or interact with the agent may allow unauthorized users to trigger actions. Without proper access groups, anyone in a group can interact with the agent.
Mitigation: Group policy configuration in src/config/group-policy.ts enforces explicit access controls. The security audit flags configurations that allow unrestricted group access. Access groups provide granular control over who can interact with the agent in group contexts. Human approval can be required for group additions.
SC-003: Pairing Code Brute Force (Medium, src/pairing/pairing-store.ts)
Threat: 8-character pairing codes using an alphabet of 29 characters (excluding confusing characters like 0, O, 1, I) could potentially be brute-forced, though the search space is large (29^8 ≈ 5×10^11 combinations). An attacker could attempt to guess valid pairing codes to gain access to the agent.
Mitigation: Pairing codes expire after 1 hour (60-minute TTL), limiting the attack window. Maximum 3 pending pairing requests per channel prevents brute-force attempts. Failed pairings could trigger additional rate limiting. The security audit flags unusual pairing activity.
SC-004: Identity Spoofing (Medium, src/channels/sender-identity.ts)
Threat: Sender identity may be forged or impersonated in some channels, allowing attackers to appear as legitimate users or the agent itself. This could enable social engineering attacks or inject malicious content that appears to come from trusted sources.
Mitigation: Channel-specific identity verification in src/channels/sender-identity.ts validates sender authenticity where supported. Cryptographic signatures are used when channels provide them. Audit logs record identity metadata for forensic analysis. Users should verify critical actions through out-of-band confirmation.
SC-005: Cross-Channel Correlation (Low, src/channels/)
Threat: The same user may interact with the agent across multiple channels (WhatsApp, Telegram, Discord), and without proper identity unification, the agent may not recognize the user’s full history or may inadvertently share information across channels inappropriately.
Mitigation: Identity correlation can be configured based on explicit user linking. Channel isolation defaults prevent automatic cross-channel data sharing. The security audit flags configurations that enable inappropriate correlation.
SC-006: Non-Human Identity Management (Medium, src/agents/identity.ts)
Threat: Sub-agents and automated sessions lack formal identity attestation, making it difficult to verify their legitimacy and trace their actions. This creates ambiguity about who or what is acting on behalf of the user.
Mitigation: Agent identity in src/agents/identity.ts can include cryptographic attestations. Sub-agent actions are logged with parent session context. Audit trails maintain chain of custody for agent actions.
Layer 7 – Agent Ecosystem
Threats Identified in OpenClaw
AE-001: Malicious Plugin Installation (Critical, extensions/*)
Threat: Extensions in extensions/ may contain malicious code that could exfiltrate data, execute unauthorized actions, or provide backdoor access. Plugin installation doesn’t currently require signature verification.
Mitigation: The security audit in src/security/audit.ts validates plugin integrity and checks for uncommitted changes. Code safety scanning in src/security/skill-scanner.ts runs on plugin code. Plugins should be loaded from trusted sources only. Plugin signing requirements should be enforced in production.
AE-002: Plugin Supply Chain Attack (High, extensions/*/package.json)
_Threat:* npm dependencies in plugins may be compromised or contain malicious code. A compromised dependency could affect all users of that plugin, potentially spreading to the core system through shared dependencies.
_Mitigation:* Dependency audits should be performed on all plugins. Exact versions in patches/ prevent supply chain attacks through version manipulation. The security audit flags plugins with external dependencies that lack pinning. Minimal plugin dependencies reduce attack surface.
AE-003: Skill Registry Poisoning (Medium, src/cli/skills-cli.ts)
_Threat:* ClawHub skill registry may contain malicious skills submitted by attackers. Users installing skills from the registry could execute harmful code or have their data exfiltrated.
_Mitigation:* Skills should be reviewed before publication to the registry. Skill signing provides authenticity verification. The skill scanner runs on all installed skills. Users should only install skills from trusted sources.
AE-004: Multi-Agent Collusion (Low, src/agents/openclaw-tools.ts)
_Threat:* Sub-agents created via sessions_spawn could collude to bypass security controls. Multiple agents working together might accomplish what a single agent cannot, potentially escalating privileges or exfiltrating data through coordinated actions.
_Mitigation:* Sub-agent actions are logged and can be correlated for collusion detection. Cross-agent communication requires explicit configuration. The security audit flags configurations that enable dangerous multi-agent scenarios. Human oversight can be required for multi-agent operations.
AE-005: Agent Impersonation (Medium, src/agents/identity.ts)
_Threat:* Fake agents could claim legitimate identities, tricking users into interacting with them or granting unauthorized access. Without strong identity verification, users cannot reliably distinguish legitimate agents from imposters.
_Mitigation:* Agent identity includes cryptographic signatures where supported. Published agent identities can be verified against known registries. Users should verify agent authenticity through out-of-band channels.
Cross-Layer Chained Threats
Critical Chain: Infra → Data → Model → Ecosystem
An attacker could compromise the gateway (DI-001) to gain access to session store (DO-003), then poison conversation history to manipulate model context (LM-002), control sub-agents via sessions_spawn (AF-002), and spread to the ecosystem via sessions_send (AE-004).
Defense-in-Depth Mitigations:
Layer 4: Gateway auth required, loopback default, audit detects misconfigurations
Layer 2: Session store encrypted, permissions, audit checks filesystem permissions
Layer 1: Context compaction removes old messages, input sanitization strips injections
Layer 3: Sandbox mode limits tool access, elevated commands require approval
Layer 7: Sub-agent spawn requires explicit allowlist, inter-agent communication monitored
OpenClaw Built in Security Related Command Reference
The follwong cli could be used regualry in additon to MAESTRO theat modeling.
# Run comprehensive security audit
openclaw security audit --deep
# Check specific areas
openclaw doctor # Configuration issues
openclaw channels status --probe # Channel connectivity
openclaw status --deep # Gateway health check
# Pairing management
openclaw pairing list # List pending requests
openclaw pairing approve <channel> <code> # Approve sender
openclaw pairing reject <channel> <code> # Reject sender



Interesting analysis Ken. Do you think having "police agents" floated in the social system to observe activities in the channels and flagging the channel (along with corresponding agents) with potential vulnerabilities may give early warnings and prevent the spread of threat by isolating those channels .
Brilliant analysis and solutions provided to potential critical threats.