
Local LLMs with OpenClaw: Ollama, Llama 3.3, Qwen 3.5 & MiniMax M2.5 – A Practical Benchmark
TL;DR: „Local LLMs with OpenClaw are production-ready in 2026. Llama 3.3 is the all-rounder, Qwen 3.5 the efficiency champion, MiniMax M2.5 the coding beast. All run via Ollama – no cloud, no cost, no privacy trade-offs."
— Till FreitagWhy Local LLMs?
Cloud APIs are convenient – but they come with three problems:
- Cost: GPT-4o costs ~$15 per million output tokens. With heavy agent use, $300–700/month is realistic.
- Privacy: Every API call sends data to US servers. GDPR-compliant? Only with a data processing agreement and risk assessment.
- Dependency: API down? Rate limit reached? Your agent stops working.
Local LLMs solve all three problems. And in 2026, they're finally good enough for production use.
30-second version: Install Ollama, pull a model, connect OpenClaw – done. No API key, no per-token cost, no data shared with third parties.
The Candidates
We tested four models suitable for local use with OpenClaw:
| Model | Provider | Parameters | Active Params | Context | Architecture |
|---|---|---|---|---|---|
| Llama 3.3 | Meta | 70B | 70B | 128K | Dense |
| Qwen 3.5 27B | Alibaba | 27B | 27B | 256K | Dense |
| Qwen 3.5 35B-A3B | Alibaba | 35B | 3B | 256K | MoE |
| MiniMax M2.5 | MiniMax | 230B | 10B | 200K | MoE |
What Does MoE Mean?
Mixture of Experts (MoE) is the secret behind the new models: although the model has 230B parameters, only 10B are activated per token. The result: GPT-4-level quality at a fraction of the compute.
Installation via Ollama
All models can be downloaded with a single command:
# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh
# Pull models
ollama pull llama3.3 # 40 GB – needs 48 GB RAM
ollama pull qwen3.5:27b # 16 GB – runs on 22 GB RAM
ollama pull qwen3.5:35b # 20 GB – only 3B active (MoE)
ollama pull minimax-m2.5 # 101 GB (3-bit) – needs 128 GB RAM
Connect to OpenClaw
openclaw config set models.providers.ollama.apiKey "ollama-local"
openclaw config set agents.defaults.model.primary "ollama/qwen3.5:27b"
Performance Benchmarks
Tested on Apple M3 Max (128 GB RAM) and NVIDIA RTX 4090 (24 GB VRAM):
Speed (Tokens/Second)
| Model | M3 Max (128 GB) | RTX 4090 (24 GB) | Notes |
|---|---|---|---|
| Llama 3.3 70B | ~18 t/s | ~25 t/s | Needs a lot of RAM |
| Qwen 3.5 27B | ~35 t/s | ~55 t/s | Best speed/quality trade-off |
| Qwen 3.5 35B-A3B | ~60 t/s | ~80 t/s | MoE turbo: only 3B active |
| MiniMax M2.5 | ~15 t/s | Not possible* | Needs >24 GB VRAM |
*MiniMax M2.5 requires at least 64 GB RAM or a multi-GPU setup.
Quality (Benchmarks)
| Model | MMLU-Pro | HumanEval | SWE-Bench | Agentic Use |
|---|---|---|---|---|
| Llama 3.3 70B | 68.9 | 82.5 | – | ★★★★☆ |
| Qwen 3.5 27B | 71.2 | 85.1 | – | ★★★★☆ |
| Qwen 3.5 35B-A3B | 69.5 | 83.8 | – | ★★★★☆ |
| MiniMax M2.5 | 74.1 | 89.3 | 80.2% | ★★★★★ |
Result: Qwen 3.5 27B offers the best trade-off between speed, quality, and resource consumption. MiniMax M2.5 is the strongest model but requires significantly more hardware.
Cost Comparison: Cloud vs. Local
Cloud Costs (per month, estimated at 50M tokens)
| Provider | Model | Input | Output | Total/Month |
|---|---|---|---|---|
| OpenAI | GPT-4o | $2.50/1M | $10/1M | ~$300 |
| Anthropic | Claude 3.5 Sonnet | $3/1M | $15/1M | ~$400 |
| OpenAI | GPT-4o mini | $0.15/1M | $0.60/1M | ~$20 |
Local Costs (one-time + electricity)
| Setup | Hardware | One-time | Electricity/Month | Break-Even |
|---|---|---|---|---|
| Mac mini M4 Pro | 48 GB RAM | ~$2,400 | ~$15 | 7–8 months |
| Mac Studio M3 Max | 128 GB RAM | ~$4,900 | ~$25 | 12–15 months |
| Linux Server + RTX 4090 | 64 GB RAM | ~$3,200 | ~$40 | 8–10 months |
| Raspberry Pi 5 | 8 GB RAM | ~$130 | ~$5 | 1 month |
Bottom line: After ~8 months, self-hosting is cheaper than any cloud API. With heavy usage (>100M tokens/month), break-even drops to 3–4 months.
Offline Scenarios
Local LLMs have one decisive advantage no cloud can offer: They work without internet.
When Is Offline Relevant?
- On the road: On trains, planes, construction sites – anywhere without stable internet
- Air-gapped environments: Security-critical infrastructure (government, military, healthcare)
- Edge deployments: IoT gateways, factory floors, remote offices
- Resilience: When the cloud API goes down, your agent keeps running
Recommended Offline Setup
# Compact model for offline use on modest hardware
ollama pull qwen3.5:35b # MoE: only 3B active, runs on 22 GB RAM
# For Raspberry Pi / edge devices
ollama pull phi-3:mini # 3.8B parameters, 4 GB RAM
OpenClaw Offline Config
{
"agents": {
"defaults": {
"model": {
"primary": "ollama/qwen3.5:35b",
"fallbacks": ["ollama/phi-3:mini"]
}
}
},
"network": {
"offline_mode": true,
"web_search": false
}
}
Which Model for Which Use Case?
| Use Case | Recommended Model | Why |
|---|---|---|
| Email triage | Qwen 3.5 27B | Fast, 256K context for long threads |
| Code analysis | MiniMax M2.5 | SWE-Bench 80.2%, best coding model |
| Quick responses | Qwen 3.5 35B-A3B | MoE: 60+ t/s on Apple Silicon |
| Summarization | Llama 3.3 70B | Solid quality, broad language understanding |
| Offline / edge | Qwen 3.5 35B-A3B | MoE + 256K context at low resource use |
| Raspberry Pi | Phi-3 Mini | Only model under 4 GB RAM |
Qwen 3.5: The Newcomer in Detail
Alibaba's Qwen 3.5 deserves special attention. The model family brings several firsts in 2026:
- 256K context: Twice as much as Llama 3.3 – ideal for long email threads or document analysis
- 201 languages: A true multilingual model, perfect for international teams
- Multimodal: The 27B and 122B variants can also process images
- Thinking mode: Built-in chain-of-thought reasoning, toggleable per parameter
- MoE variants: 35B-A3B activates only 3B parameters – runs on a MacBook Air
# Enable thinking mode (for complex tasks)
ollama run qwen3.5:27b --thinking
MiniMax M2.5: The Coding Beast
MiniMax M2.5 from Shanghai took the AI community by surprise:
- SWE-Bench Verified: 80.2% – on par with Claude Opus 4.6
- 230B parameters, 10B active: MoE architecture for efficiency
- Agentic design: Natively optimized for tool calling and search
- 200K context: Enough for complete codebases
The catch: You need at least 64 GB RAM (ideally 128 GB) for the 3-bit quantized model. But if you have the hardware, you get a model that competes with the best cloud APIs – at zero cost.
# MiniMax M2.5 via Ollama (needs a lot of RAM!)
ollama pull minimax-m2.5
openclaw config set agents.defaults.model.primary "ollama/minimax-m2.5"
Hybrid Strategy: Best of Both Worlds
Our recommendation for productive teams:
| Task | Model | Local/Cloud |
|---|---|---|
| Email & customer data | Qwen 3.5 27B | 🏠 Local |
| Code reviews | MiniMax M2.5 | 🏠 Local |
| Quick routine tasks | Qwen 3.5 35B-A3B | 🏠 Local |
| Complex analysis (non-sensitive) | Claude 3.5 Sonnet | ☁️ Cloud |
| Image generation | DALL-E 3 / Flux | ☁️ Cloud |
Rule of thumb: Personal data → always local. Everything else → based on budget and quality requirements.
Conclusion
Local LLMs are no longer a compromise in 2026 – they're a strategic decision. With Qwen 3.5 as the efficiency champion, MiniMax M2.5 as the coding powerhouse, and Llama 3.3 as the proven all-rounder, there's a model for every use case.
Combined with OpenClaw and Ollama, you get an AI agent stack that:
- Costs nothing (after hardware amortization)
- Works offline
- Is GDPR-compliant (no data shared with third parties)
- Matches cloud APIs in many scenarios
Break-even is at 3–8 months. After that, every token is free.
Want to run local LLMs with OpenClaw in production? Talk to us – we help with hardware recommendations, setup, and model selection.
More on this topic: What is OpenClaw? · OpenClaw Self-Hosting Guide · NanoClaw: The lean successor
Verwandte Artikel

OpenClaw Self-Hosting Guide: GDPR-Compliant in 30 Minutes
Self-host OpenClaw with Docker, persistent storage, and local LLMs via Ollama – fully GDPR-compliant because no data eve…
Weiterlesen
NanoClaw: The Lean Successor to OpenClaw – An AI Agent That Fits in Your Pocket
NanoClaw is the minimalist successor to OpenClaw – an AI agent that runs on a Raspberry Pi, is controllable via WhatsApp…
Weiterlesen
Deep DiveThe Best OpenClaw Alternatives 2026 – from NanoClaw to NullClaw
OpenClaw has 160,000+ GitHub stars – but not everyone needs 430,000 lines of code. We compare the best alternatives in 2…
Weiterlesen