Local LLMs with OpenClaw: Ollama, Llama 3.3, Qwen 3.5 & MiniMax M2.5 – A Practical Benchmark

    Local LLMs with OpenClaw: Ollama, Llama 3.3, Qwen 3.5 & MiniMax M2.5 – A Practical Benchmark

    Till FreitagTill Freitag28. Februar 20266 min read
    Till Freitag

    TL;DR: „Local LLMs with OpenClaw are production-ready in 2026. Llama 3.3 is the all-rounder, Qwen 3.5 the efficiency champion, MiniMax M2.5 the coding beast. All run via Ollama – no cloud, no cost, no privacy trade-offs."

    — Till Freitag

    Why Local LLMs?

    Cloud APIs are convenient – but they come with three problems:

    1. Cost: GPT-4o costs ~$15 per million output tokens. With heavy agent use, $300–700/month is realistic.
    2. Privacy: Every API call sends data to US servers. GDPR-compliant? Only with a data processing agreement and risk assessment.
    3. Dependency: API down? Rate limit reached? Your agent stops working.

    Local LLMs solve all three problems. And in 2026, they're finally good enough for production use.

    30-second version: Install Ollama, pull a model, connect OpenClaw – done. No API key, no per-token cost, no data shared with third parties.

    The Candidates

    We tested four models suitable for local use with OpenClaw:

    Model Provider Parameters Active Params Context Architecture
    Llama 3.3 Meta 70B 70B 128K Dense
    Qwen 3.5 27B Alibaba 27B 27B 256K Dense
    Qwen 3.5 35B-A3B Alibaba 35B 3B 256K MoE
    MiniMax M2.5 MiniMax 230B 10B 200K MoE

    What Does MoE Mean?

    Mixture of Experts (MoE) is the secret behind the new models: although the model has 230B parameters, only 10B are activated per token. The result: GPT-4-level quality at a fraction of the compute.

    Installation via Ollama

    All models can be downloaded with a single command:

    # Install Ollama (if not already installed)
    curl -fsSL https://ollama.com/install.sh | sh
    
    # Pull models
    ollama pull llama3.3           # 40 GB – needs 48 GB RAM
    ollama pull qwen3.5:27b        # 16 GB – runs on 22 GB RAM
    ollama pull qwen3.5:35b        # 20 GB – only 3B active (MoE)
    ollama pull minimax-m2.5       # 101 GB (3-bit) – needs 128 GB RAM

    Connect to OpenClaw

    openclaw config set models.providers.ollama.apiKey "ollama-local"
    openclaw config set agents.defaults.model.primary "ollama/qwen3.5:27b"

    Performance Benchmarks

    Tested on Apple M3 Max (128 GB RAM) and NVIDIA RTX 4090 (24 GB VRAM):

    Speed (Tokens/Second)

    Model M3 Max (128 GB) RTX 4090 (24 GB) Notes
    Llama 3.3 70B ~18 t/s ~25 t/s Needs a lot of RAM
    Qwen 3.5 27B ~35 t/s ~55 t/s Best speed/quality trade-off
    Qwen 3.5 35B-A3B ~60 t/s ~80 t/s MoE turbo: only 3B active
    MiniMax M2.5 ~15 t/s Not possible* Needs >24 GB VRAM

    *MiniMax M2.5 requires at least 64 GB RAM or a multi-GPU setup.

    Quality (Benchmarks)

    Model MMLU-Pro HumanEval SWE-Bench Agentic Use
    Llama 3.3 70B 68.9 82.5 ★★★★☆
    Qwen 3.5 27B 71.2 85.1 ★★★★☆
    Qwen 3.5 35B-A3B 69.5 83.8 ★★★★☆
    MiniMax M2.5 74.1 89.3 80.2% ★★★★★

    Result: Qwen 3.5 27B offers the best trade-off between speed, quality, and resource consumption. MiniMax M2.5 is the strongest model but requires significantly more hardware.

    Cost Comparison: Cloud vs. Local

    Cloud Costs (per month, estimated at 50M tokens)

    Provider Model Input Output Total/Month
    OpenAI GPT-4o $2.50/1M $10/1M ~$300
    Anthropic Claude 3.5 Sonnet $3/1M $15/1M ~$400
    OpenAI GPT-4o mini $0.15/1M $0.60/1M ~$20

    Local Costs (one-time + electricity)

    Setup Hardware One-time Electricity/Month Break-Even
    Mac mini M4 Pro 48 GB RAM ~$2,400 ~$15 7–8 months
    Mac Studio M3 Max 128 GB RAM ~$4,900 ~$25 12–15 months
    Linux Server + RTX 4090 64 GB RAM ~$3,200 ~$40 8–10 months
    Raspberry Pi 5 8 GB RAM ~$130 ~$5 1 month

    Bottom line: After ~8 months, self-hosting is cheaper than any cloud API. With heavy usage (>100M tokens/month), break-even drops to 3–4 months.

    Offline Scenarios

    Local LLMs have one decisive advantage no cloud can offer: They work without internet.

    When Is Offline Relevant?

    • On the road: On trains, planes, construction sites – anywhere without stable internet
    • Air-gapped environments: Security-critical infrastructure (government, military, healthcare)
    • Edge deployments: IoT gateways, factory floors, remote offices
    • Resilience: When the cloud API goes down, your agent keeps running
    # Compact model for offline use on modest hardware
    ollama pull qwen3.5:35b    # MoE: only 3B active, runs on 22 GB RAM
    
    # For Raspberry Pi / edge devices
    ollama pull phi-3:mini      # 3.8B parameters, 4 GB RAM

    OpenClaw Offline Config

    {
      "agents": {
        "defaults": {
          "model": {
            "primary": "ollama/qwen3.5:35b",
            "fallbacks": ["ollama/phi-3:mini"]
          }
        }
      },
      "network": {
        "offline_mode": true,
        "web_search": false
      }
    }

    Which Model for Which Use Case?

    Use Case Recommended Model Why
    Email triage Qwen 3.5 27B Fast, 256K context for long threads
    Code analysis MiniMax M2.5 SWE-Bench 80.2%, best coding model
    Quick responses Qwen 3.5 35B-A3B MoE: 60+ t/s on Apple Silicon
    Summarization Llama 3.3 70B Solid quality, broad language understanding
    Offline / edge Qwen 3.5 35B-A3B MoE + 256K context at low resource use
    Raspberry Pi Phi-3 Mini Only model under 4 GB RAM

    Qwen 3.5: The Newcomer in Detail

    Alibaba's Qwen 3.5 deserves special attention. The model family brings several firsts in 2026:

    • 256K context: Twice as much as Llama 3.3 – ideal for long email threads or document analysis
    • 201 languages: A true multilingual model, perfect for international teams
    • Multimodal: The 27B and 122B variants can also process images
    • Thinking mode: Built-in chain-of-thought reasoning, toggleable per parameter
    • MoE variants: 35B-A3B activates only 3B parameters – runs on a MacBook Air
    # Enable thinking mode (for complex tasks)
    ollama run qwen3.5:27b --thinking

    MiniMax M2.5: The Coding Beast

    MiniMax M2.5 from Shanghai took the AI community by surprise:

    • SWE-Bench Verified: 80.2% – on par with Claude Opus 4.6
    • 230B parameters, 10B active: MoE architecture for efficiency
    • Agentic design: Natively optimized for tool calling and search
    • 200K context: Enough for complete codebases

    The catch: You need at least 64 GB RAM (ideally 128 GB) for the 3-bit quantized model. But if you have the hardware, you get a model that competes with the best cloud APIs – at zero cost.

    # MiniMax M2.5 via Ollama (needs a lot of RAM!)
    ollama pull minimax-m2.5
    openclaw config set agents.defaults.model.primary "ollama/minimax-m2.5"

    Hybrid Strategy: Best of Both Worlds

    Our recommendation for productive teams:

    Task Model Local/Cloud
    Email & customer data Qwen 3.5 27B 🏠 Local
    Code reviews MiniMax M2.5 🏠 Local
    Quick routine tasks Qwen 3.5 35B-A3B 🏠 Local
    Complex analysis (non-sensitive) Claude 3.5 Sonnet ☁️ Cloud
    Image generation DALL-E 3 / Flux ☁️ Cloud

    Rule of thumb: Personal data → always local. Everything else → based on budget and quality requirements.

    Conclusion

    Local LLMs are no longer a compromise in 2026 – they're a strategic decision. With Qwen 3.5 as the efficiency champion, MiniMax M2.5 as the coding powerhouse, and Llama 3.3 as the proven all-rounder, there's a model for every use case.

    Combined with OpenClaw and Ollama, you get an AI agent stack that:

    • Costs nothing (after hardware amortization)
    • Works offline
    • Is GDPR-compliant (no data shared with third parties)
    • Matches cloud APIs in many scenarios

    Break-even is at 3–8 months. After that, every token is free.


    Want to run local LLMs with OpenClaw in production? Talk to us – we help with hardware recommendations, setup, and model selection.

    More on this topic: What is OpenClaw? · OpenClaw Self-Hosting Guide · NanoClaw: The lean successor

    TeilenLinkedInWhatsAppE-Mail

    Related Articles

    OpenClaw Self-Hosting Guide: GDPR-Compliant in 30 Minutes
    February 28, 20264 min

    OpenClaw Self-Hosting Guide: GDPR-Compliant in 30 Minutes

    Self-host OpenClaw with Docker, persistent storage, and local LLMs via Ollama – fully GDPR-compliant because no data eve…

    Read more
    Diagram of a Privacy Router: local models for sensitive data, cloud models for everything else
    March 17, 20264 min

    NemoClaw: NVIDIA's Privacy Router and What It Means for Agent Architecture

    NVIDIA enters the Claw ecosystem with NemoClaw – and brings a concept that could reshape agent architecture: Privacy Rou…

    Read more
    Architecture diagram of a Privacy Router: data flow split into local and cloud paths
    March 17, 20266 min

    Building a Privacy Router with OpenClaw: A Practical Guide with Code

    Privacy Routing is the concept – but how do you build it? A practical guide with OpenClaw, a policy engine, and concrete…

    Read more
    OpenClaw Pricing Shock: How to Avoid the $500 Bill
    April 5, 20262 min

    OpenClaw Pricing Shock: How to Avoid the $500 Bill

    Anthropic just killed third-party tool coverage under Claude subscriptions. If you're running OpenClaw without prep, you…

    Read more
    Comparison of three orchestration tools Make, Claude Code and OpenClaw as stack layers
    March 21, 20265 min

    Make vs. Claude Code vs. OpenClaw – Picking the Right Orchestration Layer (2026)

    Make.com, Claude Code, or OpenClaw? Three tools, three layers of the stack. Here's when to pick which orchestration tool…

    Read more
    monday.com board connected to OpenClaw AI agent as central memory and control system
    March 12, 20266 min

    monday.com + OpenClaw: How monday.com Becomes the Brain of Your AI Agent

    monday.com is more than a project management tool – it can serve as the long-term memory and execution log for an AI age…

    Read more
    NanoClaw: The Lean Successor to OpenClaw – An AI Agent That Fits in Your Pocket
    February 21, 20264 min

    NanoClaw: The Lean Successor to OpenClaw – An AI Agent That Fits in Your Pocket

    NanoClaw is the minimalist successor to OpenClaw – an AI agent that runs on a Raspberry Pi, is controllable via WhatsApp…

    Read more
    The Best OpenClaw Alternatives 2026 – from NanoClaw to NullClawDeep Dive
    February 21, 202610 min

    The Best OpenClaw Alternatives 2026 – from NanoClaw to NullClaw

    OpenClaw has 160,000+ GitHub stars – but not everyone needs 430,000 lines of code. We compare the best alternatives in 2…

    Read more
    OpenClaw AI agent interface with autonomous task management and LLM integration
    February 20, 20265 min

    What Is OpenClaw? The Open-Source AI Agent Overview

    OpenClaw is an open-source AI agent that handles tasks autonomously – from emails to calendars. Self-hosted, GDPR-compli…

    Read more