← Back to all articles

🦞 OpenClaw Architecture

A deep technical dive into how OpenClaw works under the hood. Learn the patterns that power every serious AI agent system being built today.

What Is OpenClaw Really?

Not a Chatbot, An Orchestration Platform

OpenClaw is not a chatbot that responds to prompts. It's a local gateway process that runs on your machine and acts as the control plane for AI agent execution.

You bring your own LLM API key (Claude, GPT-4, DeepSeek, local Ollama) and your own messaging platform (WhatsApp, Telegram, Slack, Discord, Signal, iMessage). OpenClaw orchestrates everything in between.

The result: an always-on AI agent that can read files, run shell commands, control browsers, send emails, manage calendars, all triggered by a message.

Key Distinction

Most "AI agents" are just API wrappers. OpenClaw is infrastructure. It handles sessions, routing, persistent state, tool execution, memory management, and multi-channel connectivity. The LLM provides intelligence; OpenClaw provides the execution environment.

Core Characteristics

✓ Self-Hosted
Everything runs locally. Your files, memory, and configuration stay on your machine.
✓ Model-Agnostic
Works with Claude, GPT-4, Gemini, local models via Ollama, bring your own API key.
✓ Multi-Channel
Access the same agent through WhatsApp, Slack, Discord, iMessage, Signal, etc.
✓ Always-On
Runs as a background process. Acts without waiting for prompts via scheduled cron jobs.
✓ Persistent Memory
Stores conversations, long-term memory, and identity as plain Markdown files you can edit.
✓ Open Source
MIT licensed. No vendor lock-in. Read, modify, and extend the source code.

The Gateway: The Nervous System

What Is the Gateway?

The Gateway is a single long-lived background process that acts as the control plane for everything in OpenClaw. It's the "single source of truth" for sessions, routing, channel connections, and authentication.

Think of it as a WebSocket server running on your machine (default: ws://127.0.0.1:18789). Every message, whether it comes from WhatsApp, Slack, Discord, or your keyboard, flows through the Gateway first.

┌─────────────────────────────────────────────────────────────┐
│                        GATEWAY (Control Plane)               │
│                                                               │
│  ┌───────────────┐  ┌────────────────┐  ┌──────────────┐   │
│  │  Channel      │  │  Router &      │  │ Agent        │   │
│  │  Adapters     │→ │  Session       │→ │ Runtime      │   │
│  │               │  │  Manager       │  │              │   │
│  └───────────────┘  └────────────────┘  └──────────────┘   │
│       ▲                                          │            │
│  WhatsApp                              Agentic Loop         │
│  Telegram                            (ReAct Pattern)        │
│  Discord                                        │            │
│  iMessage                                      ▼            │
│  Slack, etc.                          Tool Execution       │
│                                                 │            │
└─────────────────────────────────────────────────┼───────────┘
                                                   │
                                      ┌────────────┴──────┐
                                      ▼                   ▼
                                  Browser          Shell/File
                                  Automation       System

How Messages Flow Through the Gateway

1
Channel Input
Message arrives from WhatsApp, Telegram, Slack, etc. Channel adapter normalizes it into a consistent format.
2
Routing & Session
Gateway routes to appropriate agent. Message is added to session queue (serialized, one at a time).
3
Context Assembly
Agent runtime assembles context: base prompt + eligible skills + memory + session history.
4
Model Inference
Context sent to LLM (Claude, GPT-4, etc.). Model decides: reply with text or call a tool.
5
Tool Execution
If model called a tool, runtime executes it and feeds result back to context. Loop continues.
6
Response & Persistence
Model generates final reply, sent back to user. Session and memory updated and saved.
Design Pattern: Serialized Execution Per Session

OpenClaw processes messages in a single session one at a time, not in parallel. This is handled by a Command Queue per session. Why? Because concurrent tool execution can corrupt state, if two messages run simultaneously, they might try to modify the same file or trigger conflicting actions. By serializing, the Gateway guarantees consistency and prevents race conditions.

The Agentic Loop: The Brain

What Is the ReAct Loop?

ReAct = Reason + Act. This is the core pattern that separates agents from chatbots.

A chatbot receives a prompt, generates a response, and stops. An agent receives input, reasons about it, calls a tool, observes the result, reasons again, calls another tool if needed, and continues this loop until the task is complete.

while True: response = llm.call(context) if response.is_text(): send_reply(response.text) break if response.is_tool_call(): tool_result = execute_tool(response.tool_name, response.params) context.add_message("tool_result", tool_result) # Loop continues, model sees result and decides next action

The Four Stages of Context Assembly

1. Base System Prompt

Core instructions the agent always follows. Defines personality, constraints, and default behaviors.

2. Skills Prompt

A compact list of eligible skills (name, description, file path). The model reads this list and decides which skills are relevant for the current task. Full skill files are loaded on-demand only.

3. Bootstrap Context

Workspace-level configuration files that provide environment context (available integrations, permissions, workspace settings).

4. Per-Run Overrides

Ad-hoc instructions injected for a specific execution (e.g., "use this specific API key" or "prioritize speed over cost").

Why Context Assembly Matters

The model has no eyes, no direct access to files or APIs. Everything it knows comes from context. Context assembly is arguably the most important engineering decision in any agent system. OpenClaw's architecture makes this explicit: building the right context package is how you make agents work reliably.

The Tool Execution Engine

Types of Tools Available

  • Shell/CLI: Run terminal commands (with security constraints)
  • File System: Read, write, delete files in allowed directories
  • Browser: Navigate websites, extract content, fill forms
  • Canvas: Visual interface for drawing, diagramming, or UI interactions
  • Cron/Scheduler: Schedule tasks to run at specific times
  • API Integrations: Pre-built connectors to external services (Gmail, GitHub, Slack, etc.)
  • Session Management: Control conversation state and memory
Tool Execution Safety

OpenClaw implements security through an allowlist model: tools are only available if explicitly enabled in configuration. Dangerous shell structures (like recursive deletes) are hard-blocked. File access is sandboxed to specific directories. The philosophy: start restrictive, grant permissions deliberately.

Skills: On-Demand Knowledge Loading

What Is a Skill?

A Skill is a folder containing a SKILL.md file, natural language instructions that teach the agent how to handle a specific domain (e.g., GitHub PR review, Slack message triage, email management).

OpenClaw doesn't inject the full text of every skill into the system prompt. Instead, it injects a compact list (name, description, path) and lets the model decide which skills are relevant. When a skill is needed, the model can read its SKILL.md on-demand.

--- name: github-pr-reviewer description: Review GitHub PRs and post feedback --- # GitHub PR Reviewer When asked to review a pull request: 1. Use web_fetch to retrieve PR diff 2. Analyze for correctness and security 3. Structure review as: Summary, Issues, Suggestions 4. If asked, post review to GitHub API Always be constructive.
Smart Context Management

Loading only the skills you need prevents context bloat. This is especially important for token-expensive models. OpenClaw's approach: keep skill metadata compact, load detail on-demand. It's similar to how modern package managers work, you don't load every function of a library into memory at startup.

Memory: Persistent & Inspectable

Three Levels of Memory

OpenClaw uses a simple, auditable memory architecture based on plain files:

1. Session History (JSONL)

Every turn in a conversation is recorded as a line in a JSON file. Factual audit trail: what was said, what tools were called, what results came back. You can read, grep, and replay these logs.

2. Long-Term Memory (MEMORY.md)

A single Markdown file where the agent writes important facts it should remember:

## User Preferences
- Name: Alice
- Timezone: America/New_York
- Prefers concise responses
- Hobby: Machine Learning

## Work Context
- Role: Engineering Manager at TechCorp
- Team size: 8
- Current priorities: API optimization, hiring

3. Vector Search (Experimental QMD)

Query Memory Database (QMD) adds semantic search: "that project we discussed in January" retrieves the conversation even without exact keywords. Combines vector similarity + keyword matching for precision.

Why Markdown for Memory?

Markdown memory is human-readable and human-editable. You can open MEMORY.md in any text editor, fix errors, delete stale info, or manually add context. It's not locked in a database. This is a conscious design choice: make agent state transparent and portable.

How Memory Feels Magic

When you mention "that thing from last week," the agent seems to remember it perfectly. Here's how:

Traditional Approach OpenClaw Approach
Stores everything. No search. Stores everything. Hybrid search (semantic + keyword).
Model has to re-read entire history. Model only gets relevant context retrieved via search.
Context window fills up. Hallucinations increase. Fixed context for past memories. Reduces hallucination.
Expensive (many tokens per query). Cost-effective (targeted retrieval).

Security: The Real Trade-offs

⚠️ Critical Warning

OpenClaw's power comes with serious security implications. A compromised or misconfigured OpenClaw instance can:

  • Execute arbitrary shell commands
  • Exfiltrate sensitive files
  • Access email and calendar accounts
  • Be hijacked via prompt injection attacks

OpenClaw is not for casual users. It requires security literacy and careful configuration. One maintainer warned: "If you can't understand how to run a command line, this is far too dangerous for you."

Security Best Practices

1. Allowlist Model

Tools are disabled by default. Enable only what you need. Grant file access to specific directories only. Don't give the agent access to credential files or private keys.

2. Sandboxing

Run OpenClaw inside a container (Docker) or isolated environment so it can't reach system-wide resources. This is especially important if running on a personal machine with sensitive data.

3. Network Control

Keep the Gateway WebSocket server bound to localhost (127.0.0.1) by default. Don't expose it to the internet unless you're running in a hardened environment.

4. Prompt Injection Defense

Prompt injection attacks embed harmful instructions in data (e.g., in email subject lines or file contents). OpenClaw mitigates this by:

  • Hard-blocking dangerous shell patterns (recursive rm, etc.)
  • Validating tool calls before execution
  • Using semantic snapshots (parsing accessibility trees) instead of raw screenshots for web tasks

5. Monitor & Log

OpenClaw stores all session history as JSONL. Monitor logs for suspicious tool calls. Use tools like CrowdStrike Falcon or similar to detect internet-exposed OpenClaw instances.

Privacy & Model Choice

OpenClaw runs locally, but where does your message data go? That depends on your LLM choice. If you use Claude, data goes to Anthropic. If you use a local model (Ollama, LM Studio), data stays on your machine. Choose wisely based on your threat model.

Running OpenClaw on Windows

System Requirements

  • Node.js 18+: OpenClaw is built in TypeScript/JavaScript. Download from nodejs.org
  • npm or yarn: Comes with Node.js
  • API Key: Claude, GPT-4, Gemini, or local model endpoint
  • Messaging Setup: WhatsApp (Baileys library), Telegram, Discord, etc.
# Install Node.js 18+ from nodejs.org # Then open PowerShell and run: npm install -g openclaw # Initialize OpenClaw (creates config directory) openclaw init # Start the Gateway openclaw serve

Configuration (config.yml)

OpenClaw creates a config file at ~/.openclaw/config.yml. Key sections:

  • models: Your LLM provider (Anthropic, OpenAI, etc.) and API keys
  • channels: Enable messaging platforms (WhatsApp, Telegram, Discord, etc.)
  • workspace: Path to your files and skills
  • tools: Which capabilities to enable (shell, browser, file system, etc.)
  • permissions: Security policies (allowed directories, blocked commands)
Windows-Specific Notes

WSL2 Recommended: While OpenClaw runs on Windows directly, using WSL2 (Windows Subsystem for Linux) is cleaner for development and tool execution. WSL2 handles shell commands more reliably and matches Linux semantics that many tools expect.

PowerShell: Use PowerShell (not CMD) for running OpenClaw commands. It handles npm scripts better.

Path Issues: Windows uses backslashes. OpenClaw normalizes paths, but be careful when configuring file access permissions, use forward slashes or escaped backslashes.

Testing Your Setup

Once running, the Gateway exposes a web UI at http://localhost:3000 by default. You can:

  • Chat with the agent directly in the browser
  • View session history
  • Edit memory and configuration
  • Monitor tool execution and logs

Connecting Your First Channel (WhatsApp Example)

1. Enable WhatsApp in config.yml under channels

2. Restart OpenClaw

3. Scan the QR code that appears in the terminal with your phone

4. Send a test message from WhatsApp to your bot

5. Watch the magic happen, the agent receives it, processes, and responds

Cost Reality Check

Running OpenClaw Isn't Free (If Using Cloud APIs)

Costs depend on your model choice and usage:

Setup Cost/Day Notes
Claude via Anthropic API $2–15 Depends on model (Haiku vs Opus) and token usage. Agentic loops use more tokens.
GPT-4 via OpenAI $5–50 More expensive per token than Claude. Longer agentic chains = higher cost.
OpenRouter (routing) $1–10 Most cost-effective. Routes to cheapest model that meets quality requirements.
Local Model (Ollama) $0 Free, but requires GPU or CPU power. Inference is slower. Quality varies by model.
Cost Optimization

Use OpenRouter if cost is a concern. It intelligently routes requests to the best value model. For heavy users, local models (Ollama + Mistral 7B or similar) offer nearly free inference at the cost of slower responses.

Key Takeaways: Understanding the Architecture

1. Gateway = Control Plane

A single process orchestrates everything. This separation (Gateway vs Agent Runtime) is critical. Real agent systems always have an orchestration layer.

2. Sessions are Serialized

Messages in a session are processed one at a time. This prevents race conditions and keeps state consistent. Concurrency is a system-level decision, not the default.

3. ReAct Loop is Standard

Every serious agent uses Reason + Act: model reasons, calls a tool, observes result, loops. This is the defining pattern.

4. Context Assembly is Everything

The model only knows what you put in context. Building the right context package is the highest-leverage engineering decision.

5. Skills are Lazy-Loaded

Inject metadata, load detail on-demand. Prevents context bloat and token waste.

6. Memory is Transparent

Markdown files you can read and edit. Makes agent behavior auditable and controllable, not a black box.

7. Security Requires Active Hardening

OpenClaw is powerful precisely because it's dangerous. Allowlists, sandboxing, and monitoring are non-negotiable.

8. This Is Production-Grade Architecture

OpenClaw isn't a hobby project. The patterns it implements (Gateway, serialized queues, context management, memory systems) are the same ones powering enterprise agent systems.

About the Author
🏗️
// Author
Barnabas Waweru
Systems Architect · Founder of The Deep Family

Deep technical explorations of software architecture, AI systems, networking protocols, and the engineering decisions that power the systems we depend on every day.

All Articles Contact
// Comments
0 comments
// Leave a comment
Your email is never published. Comments are moderated.
💬
No comments yet. Be the first to start the conversation.