I was exicited and scared of OpenClaw, so I locked it down.

I Watched OpenClaw Delete a Meta Director's Inbox. And decided I need a kill switch — before the agent decides for me.

Feb 26, 2026

I like OpenClaw. I use it for many personal things - call me to remind about doctor appoint, search for opensource Github project,… you call it! It’s fast, it’s hackable, and it connects to basically everything.

And then it deleted Meta Director’s email.

‘This should terrify you’: Meta Superintelligence safety director lost control of her AI agent—it deleted her emails

If you missed it: in February 2026, an OpenClaw agent connected to a Meta director for AI’s inbox went on a speed-run. It mass-deleted emails, ignored stop commands, blew through cost in minutes, and kept going even after the user tried to shut it down. The context window compacted and the agent lost track of the original instructions. It just… decided deleting was the task.

That scared me. Not because OpenClaw is broken — it’s a great agent runtime. But because there’s nothing between the agent and the API. No filter on what tools the model sees. No cost ceiling. No way to remotely kill a run. No record of what happened that you could trust after the fact. It’s a straight pipe from agent to OpenAI, and if the agent goes sideways, you find out when the damage is done.

So I built a way to put a wall in front of it.

The actual problem

OpenClaw sends your LLM requests directly to OpenAI. The model sees every tool you registered — including delete_emails, bulk_remove, drop_table, whatever you’ve wired up. If the model decides to call one, it calls it. There’s no checkpoint, no approval, no “hey, are you sure?”

And there’s no audit trail. If something goes wrong, you’re digging through stdout logs trying to reconstruct what the agent did, in what order, with whose data. Good luck.

What I wanted was simple:

Don’t let the model see tools it shouldn’t use. Not “block the call after it happens.” Remove the tool from the request before the model knows it exists. It can’t call delete_emails if it was never told about delete_emails.
Cap the spend. Daily, monthly, per-request. When the budget’s done, the gateway says no.
Record everything. Every request, every denial, every tool that got stripped. Signed, queryable, trustworthy.
Keep my real API key out of OpenClaw. OpenClaw gets a caller token. The real key lives in an encrypted vault and gets injected at forward time.

How I set it up

I built this into Dativo Talon — a single Go binary that sits between OpenClaw and OpenAI. Here’s the exact setup I run.

Step 1: Install and init

go install github.com/dativo-io/talon/cmd/talon@latest

mkdir talon-openclaw && cd talon-openclaw
talon init --pack openclaw --name openclaw-gateway

This generates two files: agent.talon.yaml (server policy) and talon.config.yaml (gateway config). The gateway config is where the real controls live.

Step 2: Store your OpenAI key in the vault

export TALON_SECRETS_KEY=$(openssl rand -hex 32)  # save this somewhere safe
talon secrets set openai-api-key "$OPENAI_API_KEY"

Your real OpenAI key is now encrypted at rest. OpenClaw will never see it.

Step 3: Start the gateway

talon serve --gateway

That’s it. Talon is now listening on localhost:8080.

Step 4: Point OpenClaw at Talon

In ~/.openclaw/openclaw.json:

{
  "models": {
    "providers": {
      "openai": {
        "baseUrl": "http://localhost:8080/v1/proxy/openai/v1",
        "apiKey": "talon-gw-openclaw-001",
        "api": "openai-responses",
        "models": [
          { "id": "gpt-4o", "name": "gpt-4o" },
          { "id": "gpt-4o-mini", "name": "gpt-4o-mini" }
        ]
      }
    }
  }
}

Notice the apiKey — that’s the caller token, not the OpenAI key. Talon identifies OpenClaw by this token and injects the real key when it forwards to OpenAI.

Restart OpenClaw (openclaw gateway stop && openclaw gateway start) and you’re running through the gateway.

The config that would have stopped the inbox incident

Here’s the talon.config.yaml I use. I’ll walk through the parts that matter.

gateway:
  enabled: true
  listen_prefix: "/v1/proxy"
  mode: "enforce"

  providers:
    openai:
      enabled: true
      secret_name: "openai-api-key"
      base_url: "https://api.openai.com"
      allowed_models: ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"]

  callers:
    - name: "openclaw-main"
      api_key: "talon-gw-openclaw-001"
      tenant_id: "default"
      team: "engineering"
      allowed_providers: ["openai"]
      policy_overrides:
        max_daily_cost: 25.00
        max_monthly_cost: 500.00
        pii_action: "redact"
        allowed_models: ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"]

  default_policy:
    require_caller_id: true
    log_prompts: true

    # --- THIS IS THE BIG ONE ---
    # Tool governance: strip dangerous tools BEFORE the model sees them.
    tool_policy_action: "filter"
    forbidden_tools:
      - "delete_*"
      - "admin_*"
      - "export_all_*"
      - "bulk_*"
      - "rm_*"
      - "drop_*"

    # PII: redact personal data from requests headed to OpenAI
    default_pii_action: "redact"
    response_pii_action: "warn"

    # Attachments: scan PDFs and CSVs for prompt injection
    attachment_policy:
      action: "warn"
      injection_action: "block"
      max_file_size_mb: 10

  rate_limits:
    global_requests_per_min: 300
    per_caller_requests_per_min: 60

  timeouts:
    connect_timeout: 10s
    request_timeout: 120s
    stream_idle_timeout: 60s

Let me break down what each piece would have done in the emails incident:

forbidden_tools: ["delete_*", "bulk_*"] — The agent had access to delete_email, delete_thread, and bulk operations. With this config, Talon strips those tools from the JSON body before OpenAI ever sees them. The model literally cannot decide to delete anything because it doesn’t know deletion is an option.
tool_policy_action: "filter" — This is the mode. "filter" silently removes forbidden tools and forwards the rest. If you want to be more aggressive, set it to "block" — that rejects the entire request if any forbidden tool is present. I prefer "filter" because it keeps the agent functional for everything except the dangerous stuff.
max_daily_cost: 25.00 — The incident ran up significant cost in minutes. This cap shuts the door after $25/day for this caller. Done. No negotiation.
per_caller_requests_per_min: 60 — The agent was firing requests as fast as it could. Rate limiting slows a runaway agent to a manageable pace and gives you time to notice.
request_timeout: 120s — No single request gets more than 2 minutes. The agent can’t sit in an infinite loop waiting for a response.

Per-caller tool allowlists (when you want to be strict)

If forbidden_tools is a blocklist, you can also go the other direction — a strict allowlist. Only the tools you name get through:

callers:
  - name: "openclaw-main"
    policy_overrides:
      allowed_tools: ["search_web", "read_file", "list_files", "create_draft"]
      tool_policy_action: "block"

Now OpenClaw can only use those four tools. Everything else — delete_emails, send_email, admin_reset, whatever — gets rejected. The model never sees them. This is the nuclear option and it’s the one I’d use if I were connecting an agent to anyone’s inbox.

Verify it works

Send a request with a dangerous tool and watch what happens:

curl -s -X POST http://localhost:8080/v1/proxy/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer talon-gw-openclaw-001" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Clean up my inbox"}],
    "tools": [
      {"type":"function","function":{"name":"search_web","parameters":{}}},
      {"type":"function","function":{"name":"delete_emails","parameters":{}}}
    ]
  }'

delete_emails gets stripped. The model only sees search_web. Check the evidence:

talon audit list --agent openclaw-main --limit 5

You’ll see exactly which tools were requested, which were filtered, and which were forwarded. Signed and timestamped.

When this isn’t the answer

You’re just playing around. If it’s a hobby project and nothing is at stake, the gateway is overhead you don’t need. Especially if you have unlimited money , and you have nothing to hide ;)
You trust the tool set completely. If your agent only has read-only tools — no delete, no write, no send — the risk profile is lower. Still worth auditing, but the urgency is different.
You need governance inside MCP tool calls. The gateway governs what goes to and from the LLM. If you need policy on every individual tool invocation (not just what the model is told about), that’s Talon’s MCP proxy — a different deployment shape.

Final thought

The email incident wasn’t a bug in OpenClaw. It was a missing layer. The agent did exactly what agents do — it picked from the tools it was given and executed. The problem is it was given delete_emails and nobody was standing between the model and that tool.

That’s what I’m solving. Not replacing OpenClaw — I still use it every day( finger cross my tool would catch all dangerous stuff). Just making sure it runs through a gateway that strips the dangerous tools, caps the cost, and writes down everything that happened. If something goes wrong, I want to know exactly what the agent tried to do and exactly where it was stopped.

talon init --pack openclaw. Fifteen minutes. That’s the difference between “the agent deleted everything” and “the agent tried to delete everything and was told no.”

Data, Engineering, and Beyond

Discussion about this post

Ready for more?