A deep technical dive into how OpenClaw works under the hood. Learn the patterns that power every serious AI agent system being built today.
OpenClaw is not a chatbot that responds to prompts. It's a local gateway process that runs on your machine and acts as the control plane for AI agent execution.
You bring your own LLM API key (Claude, GPT-4, DeepSeek, local Ollama) and your own messaging platform (WhatsApp, Telegram, Slack, Discord, Signal, iMessage). OpenClaw orchestrates everything in between.
The result: an always-on AI agent that can read files, run shell commands, control browsers, send emails, manage calendars, all triggered by a message.
Most "AI agents" are just API wrappers. OpenClaw is infrastructure. It handles sessions, routing, persistent state, tool execution, memory management, and multi-channel connectivity. The LLM provides intelligence; OpenClaw provides the execution environment.
The Gateway is a single long-lived background process that acts as the control plane for everything in OpenClaw. It's the "single source of truth" for sessions, routing, channel connections, and authentication.
Think of it as a WebSocket server running on your machine (default: ws://127.0.0.1:18789). Every message, whether it comes from WhatsApp, Slack, Discord, or your keyboard, flows through the Gateway first.
┌─────────────────────────────────────────────────────────────┐
│ GATEWAY (Control Plane) │
│ │
│ ┌───────────────┐ ┌────────────────┐ ┌──────────────┐ │
│ │ Channel │ │ Router & │ │ Agent │ │
│ │ Adapters │→ │ Session │→ │ Runtime │ │
│ │ │ │ Manager │ │ │ │
│ └───────────────┘ └────────────────┘ └──────────────┘ │
│ ▲ │ │
│ WhatsApp Agentic Loop │
│ Telegram (ReAct Pattern) │
│ Discord │ │
│ iMessage ▼ │
│ Slack, etc. Tool Execution │
│ │ │
└─────────────────────────────────────────────────┼───────────┘
│
┌────────────┴──────┐
▼ ▼
Browser Shell/File
Automation System
OpenClaw processes messages in a single session one at a time, not in parallel. This is handled by a Command Queue per session. Why? Because concurrent tool execution can corrupt state, if two messages run simultaneously, they might try to modify the same file or trigger conflicting actions. By serializing, the Gateway guarantees consistency and prevents race conditions.
ReAct = Reason + Act. This is the core pattern that separates agents from chatbots.
A chatbot receives a prompt, generates a response, and stops. An agent receives input, reasons about it, calls a tool, observes the result, reasons again, calls another tool if needed, and continues this loop until the task is complete.
while True:
response = llm.call(context)
if response.is_text():
send_reply(response.text)
break
if response.is_tool_call():
tool_result = execute_tool(response.tool_name, response.params)
context.add_message("tool_result", tool_result)
# Loop continues, model sees result and decides next action
Core instructions the agent always follows. Defines personality, constraints, and default behaviors.
A compact list of eligible skills (name, description, file path). The model reads this list and decides which skills are relevant for the current task. Full skill files are loaded on-demand only.
Workspace-level configuration files that provide environment context (available integrations, permissions, workspace settings).
Ad-hoc instructions injected for a specific execution (e.g., "use this specific API key" or "prioritize speed over cost").
The model has no eyes, no direct access to files or APIs. Everything it knows comes from context. Context assembly is arguably the most important engineering decision in any agent system. OpenClaw's architecture makes this explicit: building the right context package is how you make agents work reliably.
OpenClaw implements security through an allowlist model: tools are only available if explicitly enabled in configuration. Dangerous shell structures (like recursive deletes) are hard-blocked. File access is sandboxed to specific directories. The philosophy: start restrictive, grant permissions deliberately.
A Skill is a folder containing a SKILL.md file, natural language instructions that teach the agent how to handle a specific domain (e.g., GitHub PR review, Slack message triage, email management).
OpenClaw doesn't inject the full text of every skill into the system prompt. Instead, it injects a compact list (name, description, path) and lets the model decide which skills are relevant. When a skill is needed, the model can read its SKILL.md on-demand.
---
name: github-pr-reviewer
description: Review GitHub PRs and post feedback
---
# GitHub PR Reviewer
When asked to review a pull request:
1. Use web_fetch to retrieve PR diff
2. Analyze for correctness and security
3. Structure review as: Summary, Issues, Suggestions
4. If asked, post review to GitHub API
Always be constructive.
Loading only the skills you need prevents context bloat. This is especially important for token-expensive models. OpenClaw's approach: keep skill metadata compact, load detail on-demand. It's similar to how modern package managers work, you don't load every function of a library into memory at startup.
OpenClaw uses a simple, auditable memory architecture based on plain files:
Every turn in a conversation is recorded as a line in a JSON file. Factual audit trail: what was said, what tools were called, what results came back. You can read, grep, and replay these logs.
A single Markdown file where the agent writes important facts it should remember:
## User Preferences
- Name: Alice
- Timezone: America/New_York
- Prefers concise responses
- Hobby: Machine Learning
## Work Context
- Role: Engineering Manager at TechCorp
- Team size: 8
- Current priorities: API optimization, hiring
Query Memory Database (QMD) adds semantic search: "that project we discussed in January" retrieves the conversation even without exact keywords. Combines vector similarity + keyword matching for precision.
Markdown memory is human-readable and human-editable. You can open MEMORY.md in any text editor, fix errors, delete stale info, or manually add context. It's not locked in a database. This is a conscious design choice: make agent state transparent and portable.
When you mention "that thing from last week," the agent seems to remember it perfectly. Here's how:
| Traditional Approach | OpenClaw Approach |
|---|---|
| Stores everything. No search. | Stores everything. Hybrid search (semantic + keyword). |
| Model has to re-read entire history. | Model only gets relevant context retrieved via search. |
| Context window fills up. Hallucinations increase. | Fixed context for past memories. Reduces hallucination. |
| Expensive (many tokens per query). | Cost-effective (targeted retrieval). |
OpenClaw's power comes with serious security implications. A compromised or misconfigured OpenClaw instance can:
OpenClaw is not for casual users. It requires security literacy and careful configuration. One maintainer warned: "If you can't understand how to run a command line, this is far too dangerous for you."
Tools are disabled by default. Enable only what you need. Grant file access to specific directories only. Don't give the agent access to credential files or private keys.
Run OpenClaw inside a container (Docker) or isolated environment so it can't reach system-wide resources. This is especially important if running on a personal machine with sensitive data.
Keep the Gateway WebSocket server bound to localhost (127.0.0.1) by default. Don't expose it to the internet unless you're running in a hardened environment.
Prompt injection attacks embed harmful instructions in data (e.g., in email subject lines or file contents). OpenClaw mitigates this by:
OpenClaw stores all session history as JSONL. Monitor logs for suspicious tool calls. Use tools like CrowdStrike Falcon or similar to detect internet-exposed OpenClaw instances.
OpenClaw runs locally, but where does your message data go? That depends on your LLM choice. If you use Claude, data goes to Anthropic. If you use a local model (Ollama, LM Studio), data stays on your machine. Choose wisely based on your threat model.
# Install Node.js 18+ from nodejs.org
# Then open PowerShell and run:
npm install -g openclaw
# Initialize OpenClaw (creates config directory)
openclaw init
# Start the Gateway
openclaw serve
OpenClaw creates a config file at ~/.openclaw/config.yml. Key sections:
WSL2 Recommended: While OpenClaw runs on Windows directly, using WSL2 (Windows Subsystem for Linux) is cleaner for development and tool execution. WSL2 handles shell commands more reliably and matches Linux semantics that many tools expect.
PowerShell: Use PowerShell (not CMD) for running OpenClaw commands. It handles npm scripts better.
Path Issues: Windows uses backslashes. OpenClaw normalizes paths, but be careful when configuring file access permissions, use forward slashes or escaped backslashes.
Once running, the Gateway exposes a web UI at http://localhost:3000 by default. You can:
1. Enable WhatsApp in config.yml under channels
2. Restart OpenClaw
3. Scan the QR code that appears in the terminal with your phone
4. Send a test message from WhatsApp to your bot
5. Watch the magic happen, the agent receives it, processes, and responds
Costs depend on your model choice and usage:
| Setup | Cost/Day | Notes |
|---|---|---|
| Claude via Anthropic API | $2–15 | Depends on model (Haiku vs Opus) and token usage. Agentic loops use more tokens. |
| GPT-4 via OpenAI | $5–50 | More expensive per token than Claude. Longer agentic chains = higher cost. |
| OpenRouter (routing) | $1–10 | Most cost-effective. Routes to cheapest model that meets quality requirements. |
| Local Model (Ollama) | $0 | Free, but requires GPU or CPU power. Inference is slower. Quality varies by model. |
Use OpenRouter if cost is a concern. It intelligently routes requests to the best value model. For heavy users, local models (Ollama + Mistral 7B or similar) offer nearly free inference at the cost of slower responses.
A single process orchestrates everything. This separation (Gateway vs Agent Runtime) is critical. Real agent systems always have an orchestration layer.
Messages in a session are processed one at a time. This prevents race conditions and keeps state consistent. Concurrency is a system-level decision, not the default.
Every serious agent uses Reason + Act: model reasons, calls a tool, observes result, loops. This is the defining pattern.
The model only knows what you put in context. Building the right context package is the highest-leverage engineering decision.
Inject metadata, load detail on-demand. Prevents context bloat and token waste.
Markdown files you can read and edit. Makes agent behavior auditable and controllable, not a black box.
OpenClaw is powerful precisely because it's dangerous. Allowlists, sandboxing, and monitoring are non-negotiable.
OpenClaw isn't a hobby project. The patterns it implements (Gateway, serialized queues, context management, memory systems) are the same ones powering enterprise agent systems.