Air-Gapped Private AI: OpenClaw + Ollama for Sensitive Data
For when βwe promise the data isnβt storedβ isnβt good enough.
The Problem with Cloud AI for Sensitive Data
Cloud AI is convenient. Itβs also a permanent, uncontrolled data leak.
When you send employee records to an AI API to summarize, or paste a legal contract into an online model β that data travels to someone elseβs servers. Theirs. Their subcontractors. Their training pipelines. You have no visibility and no recall. For most business work thatβs fine. For HR documents, medical records, financial statements, proprietary source code, or anything with PII β itβs a compliance nightmare dressed up as productivity.
The answer isnβt βbe more careful.β The answer is architecture: the data never leaves the machine.
What This Setup Does
You run OpenClaw on a machine that has zero network access. Ollama handles LLM inference locally. OpenClaw handles the workflow, tool access, and orchestration. No outbound connections. No API calls. No data leaves.
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β AIR-GAPPED MACHINE β
β β
β ββββββββββββββββ ββββββββββββββββ β
β β OpenClaw βββββββΆβ Ollama β β
β β (agent) ββββββββ (local LLM) β β
β ββββββββββββββββ ββββββββββββββββ β
β β
β All processing happens locally. β
β Zero network access. Zero data egress. β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
How It Works
1. Set Up Ollama
Install Ollama on your isolated machine and pull a capable model before severing the network:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull models (do this before air-gapping)
ollama pull qwen3.5:27b # 18GB β RECOMMENDED: best balance of quality + RAM
ollama pull qwen3.5:14b # 9GB β strong alternative for 16GB RAM machines
ollama pull qwen3.5:4b # 2.7GB β no-GPU fallback, still surprisingly capable
ollama pull llama3.2-vision # vision model β see Vision section below
# Start the local API (drop-in compatible with OpenAI API)
ollama serve
Ollamaβs local API runs at http://localhost:11434. Point your OpenClaw config to it with no changes to prompt structure.
2. Configure OpenClaw to Use the Local Engine
# openclaw.yaml
providers:
- name: ollama-local
provider: ollama
api_base: http://localhost:11434
model: qwen3.5:27b
tasks:
sensitive-document-review:
provider: ollama-local
prompt: |
Review this HR document and flag any compliance issues,
missing fields, or language that could create liability.
Summarize findings in a table. Flag critical issues first.
3. Sever the Network
Option A β Hardware air-gap (strongest): Physically remove or disable the Wi-Fi card and ethernet adapter. For a server, donβt connect it to any network at all.
Option B β Software firewall (belt and suspenders):
# Block all outbound traffic β last line of defense
sudo iptables -P INPUT DROP
sudo iptables -P FORWARD DROP
sudo iptables -P OUTPUT DROP
sudo iptables -A INPUT -i lo -j ACCEPT # loopback only
sudo iptables -A OUTPUT -o lo -j ACCEPT
Option C β Dedicated VLAN: Isolate on a private VLAN with no internet egress and no access to sensitive network segments.
For anything touching genuinely sensitive data: Option A + Option B.
4. Run Your Workflows
tasks:
hr-document-review:
provider: ollama-local
triggers:
- cron: "0 9 * * 1-5"
steps:
- read: "/data/hr/pending/*.pdf"
- prompt: |
Review each document for:
- Compliance gaps vs. current labor law
- Missing required fields or signatures
- Language that could create liability
Summarize findings in a table. Flag critical issues first.
- write: "/data/hr/reviews/daily-summary.md"
Which Local Model Should You Run?
Qwen 3.5 is Alibabaβs strongest release to date and the current recommended default for local inference. It comes in several sizes β all share the same architecture improvements over Qwen 2.5, including significantly better reasoning and instruction following.
Recommended Models (March 2026)
| Model | Size on Disk | RAM Required | Context | Best For |
|---|---|---|---|---|
| Qwen 3.5 27B β | ~18 GB | 24β32 GB | 256K | Recommended default. Best quality-per-RAM ratio. Desktop/gpu workstation. |
| Qwen 3.5 14B | ~9 GB | 16 GB | 256K | Mid-range hardware. Still excellent at document processing. |
| Qwen 3.5 4B | ~2.7 GB | 6β8 GB | 32K | No-GPU laptops. Surprisingly capable for simple tasks. |
| Qwen 3.5 122B MoE | ~55 GB | 64+ GB | 256K | Server-grade. 10B active params = Qwen 3.5 14B capability at lower inference cost. |
| Llama 3.3 8B | ~4.9 GB | 8 GB | 128K | Budget/legacy option. Qwen 3.5 4B beats it at lower RAM. |
| DeepSeek R1 32B | ~23 GB | 32 GB | 128K | Hard reasoning tasks. Best for multi-step math and logic. |
Start with Qwen 3.5 27B. If you have 32GB RAM on your machine, itβs the highest-quality local model available for document processing, classification, and summarization without a server-grade setup.
GPU VRAM Requirements
Speed matters. Running on CPU works, but GPU inference is 10β30x faster. Hereβs what you need for real-time interaction:
| Setup | VRAM | GPU Examples | Inference Speed |
|---|---|---|---|
| Qwen 3.5 4B (Q4) | 4β6 GB | RTX 3060, M1 Mac | 30β50 tok/sec |
| Qwen 3.5 14B (Q4) | 8β10 GB | RTX 3080, RTX 4060 Ti | 20β35 tok/sec |
| Qwen 3.5 27B (Q4) | 16β20 GB | RTX 4090, A5000, M3 Pro | 15β25 tok/sec |
| Qwen 3.5 27B (Q8) | 28β32 GB | RTX 4090 (24GB), A100 40GB | 20β30 tok/sec |
| Qwen 3.5 122B MoE (Q4) | 14β18 GB active | RTX 4090, A5000 | 25β40 tok/sec |
Without a GPU, CPU inference on Qwen 3.5 27B runs at 2β5 tokens/second β usable for batch processing, painful for interactive use. Prioritize the 14B model if youβre CPU-only.
Vision-Capable Local Models
Need to process screenshots, scanned documents, photos of receipts, or visual inspection tasks? Several local models handle this natively:
ollama pull qwen2.5vl:14b # 9GB β best vision model for Ollama, 128K context
ollama pull qwen2.5vl:72b # 45GB β strongest vision, requires 48GB+ RAM
ollama pull llama3.2-vision:11b # 7.5GB β solid vision, Apple Silicon friendly
ollama pull llama3.2-vision:90b # 55GB β close to cloud vision quality
ollama pull gemma-3-27b # 18GB β Google model, multimodal + strong reasoning
Vision models let you do things like:
- Parse handwritten forms or stamped documents from photos
- Analyze screenshots of dashboards or UIs
- Process scanned contracts with mixed handwriting and print
- Inspect visual outputs of automated systems
All fully offline. For an air-gapped security setup, adding vision means you can process paper documents, whiteboard photos, and physical evidence β not just digital files.
Tool-Use Capable Local Models
Modern agents need more than text generation β they need to call tools, use browsers, execute code. Several open-weight models now handle this:
| Model | Tool Use | Browser Use | Best For |
|---|---|---|---|
| Qwen 3.5 (all sizes) | β Native | β via browser tool | Recommended default for agentic workflows |
| Llama 4 Scout | β | β | Massive 10M token context, best for large document agents |
| DeepSeek R1 | β | β (limited) | Strong reasoning + tool use, lower cost |
| Mistral Small 4 | β | β | Fast tool-use cycles, real-time agentic tasks |
| Qwen 2.5 Coder | β | β | Code-focused agent workflows |
Qwen 3.5 has the most robust tool-use implementation of any open-weight model as of early 2026. For OpenClaw workflows that need to use tools (file I/O, code execution, API calls), Qwen 3.5 is the recommended choice.
Traditional Setup vs. Air-Gapped
| Cloud AI | Air-Gapped Local | |
|---|---|---|
| Data leaves your network | β | β Never |
| API costs | β Per-token fees | β Zero |
| Internet required | β | β None |
| Compliance (GDPR, HIPAA, etc.) | β οΈ Complex DPA required | β Guaranteed data sovereignty |
| Model capability | βοΈ Frontier β best on hard reasoning | β Excellent for most tasks |
| Vision support | β | β (with local vision models) |
| Tool use / agentic workflows | β | β (Qwen 3.5, Llama 4) |
| Real-time web access | β | β (by design β intentional) |
| GPU required for speed | β | β Recommended |
| Hardware investment | β None | β Upfront cost |
Limitations to Know
- Frontier reasoning gap: On very hard multi-step reasoning (PhD-level math, cutting-edge code), frontier cloud models still lead slightly. For document review, classification, summarization, and most business tasks β local models are at effective parity.
- No real-time information: No web access by design. If workflows need current data, run a second OpenClaw instance with internet access for that work only, with proper data separation.
- Hardware investment: Qwen 3.5 27B at full quality needs 32GB RAM. Budget accordingly β a workstation with 64GB or an M3 Pro MacBook Pro handles it comfortably.
- No automatic updates: The machine is isolated. Model updates require physical media or a one-time controlled transfer.
What This Doesnβt Replace
Air-gapping is a physical security control β not a magic privacy button. It doesnβt protect against:
- A malicious local user with terminal access
- Keyloggers or malware on the machine itself
- Physical theft (full-disk encryption mitigates this)
- Insiders with legitimate access who misuse it
Combine air-gapping with standard access controls, audit logging, and need-to-know principles. The air-gap is your last line of defense β not your only one.
Is This Overkill?
For most people, yes. For data that actually matters β the stuff youβd lose sleep over if it leaked β itβs the only architecture that actually guarantees what you need.
Most βprivate AIβ solutions are really just βwe promise.β Air-gapping is βwe couldnβt send this data out even if we wanted to.β
Thatβs a meaningful difference.
Want to try this with OpenClaw?
OpenClaw is free and open source. Get started at openclaw.ai
Try OpenClaw β