
Self-Hosted & Privacy Layer 2026: Ontheia, Anything LLM & Privacy Router
TL;DR: „GDPR-compliant AI is no longer theory in 2026. Ontheia + Privacy Router + Ollama deliver the full local-AI-first stack today, without hyperscalers. Once RTX Spark ships, this becomes the default for mid-market and enterprise."
— Till FreitagWhy self-host at all?
Three drivers push mid-market and enterprise to self-hosting in 2026:
- GDPR & Schrems II – prompts with customer data must not flow to the US.
- Industry regulation – BaFin, KRITIS, pharma, public sector demand on-premise or at least EU sovereignty.
- Cost control – equipping 100+ employees with LLMs quickly costs five figures per month at Anthropic/OpenAI.
Self-hosting doesn't mean "buy an LLM and forget about it". It means cleanly separating runtime (agent), routing layer (privacy-aware), model layer (local LLM). These three building blocks are what we look at.
The runtime tier: Ontheia, Anything LLM, NanoClaw
Ontheia – The EU-native open-source runtime
TypeScript, Docker, AGPL-3.0. Speaks Anthropic, OpenAI, Gemini, Grok and Ollama out of the box. Setup: 15 minutes (Docker Compose).
- Typical workflow: Ontheia as the central agent runtime in your own data center. Users chat via your own web frontend, skills run on your own infrastructure, data never leaves the house.
- Best for: EU mid-market companies that want an OpenClaw-like experience without an Anthropic dependency.
- Strength: AGPL forces forks to stay open – predictable roadmap, no embrace-extend-extinguish risk.
Anything LLM – The all-in-one hub
34,000+ stars. RAG, multi-LLM, workspace concept, browser UI, desktop app. Setup: 20 minutes (Docker or desktop installer).
- Typical workflow: workspace per department, own document collections, each workspace binds its own LLM (Ollama, Anthropic, Mistral). RAG built in – drop in a PDF, ask a question, get an answer with sources.
- Best for: knowledge work, internal knowledge management, onboarding assistants.
- Strength: lowest barrier of all self-hosting options. Runnable even without a DevOps team.
NanoClaw – The security-focused OpenClaw clone
8,400+ stars, container isolation per skill, WhatsApp integration. Setup: 30 minutes (Docker Compose + skill config).
- Typical workflow: like OpenClaw, but every skill runs in its own container with least-privilege networking. Ideal for risky skills (browser automation, code execution).
- Best for: teams that want OpenClaw power but need to shrink its attack surface.
- Strength: security by design instead of security patch.
The routing layer: Privacy Router
The Privacy Router is our own open-source tool. It sits between runtime and LLM and decides per request which model answers:
- Sensitive prompt (person names, IBAN, medical data) → local model (Ollama, vLLM).
- Generic prompt → cheap cloud model (Haiku, Mini).
- Complex reasoning prompt without PII → best cloud model (Sonnet, GPT).
Setup: 10 minutes. Configuration as YAML, rules via RegEx + ML classifier.
- Typical workflow: runtime calls Privacy Router instead of OpenAI/Anthropic directly. Router classifies, routes, logs – audit trail included.
- Best for: hybrid stacks that need to combine cost optimization and GDPR.
The model layer: Ollama, vLLM, llama.cpp
- Ollama – zero barrier.
ollama run mistraland done. Best for: laptops, single user, prototypes. - vLLM – production-grade. Paged attention, high throughput, OpenAI-compatible API. Best for: central GPU servers, multi-user workloads.
- llama.cpp – maximally portable. Runs on Apple Silicon, CPU, embedded devices. Best for: edge scenarios.
Hardware layer (announced): NVIDIA RTX Spark
The announced RTX Spark is set to deliver 1,700 tokens/s – enough to run 30B models for a 50-person team at acceptable latency. Status: announced, not yet available. Today, bridge with RTX 6000 Ada, H100 or Apple M Studio clusters.
Quick-Select: which self-hosting stack for which profile?
| Profile | Recommendation | Why |
|---|---|---|
| Fastest start | Anything LLM Desktop + Ollama | One-click installer, RAG included |
| Highest privacy control | Ontheia + Privacy Router + vLLM | Fully on-premise, deterministic routing |
| Best overall package | NanoClaw + Privacy Router + Ollama | Container isolation, hybrid model mix |
| Edge / embedded | llama.cpp + custom runtime | Runs on any device, no server needed |
Typical workflows by use case
- GDPR-compliant internal knowledge assistant: Anything LLM + Ollama (Mistral 7B) on a workstation PC. Documents stay in-house, answers with sources.
- Hybrid stack with cost optimization: Ontheia → Privacy Router → (Ollama for PII | Claude Haiku for generic | Claude Sonnet for complex). Saves 60–80% cloud cost at full compliance.
- High-risk skill (browser automation): NanoClaw with container isolation. Skill may only hit one domain, no filesystem access, network egress logged.
- Edge deployment (machine, vehicle, kiosk): llama.cpp + small 3B model. Works offline, zero cloud risk.
- Pilot without IT budget: Anything LLM Desktop, locally on a MacBook M3 with Ollama. Productive in 30 minutes.
Till Freitag recommendation
Start today: Anything LLM + Ollama on a decent workstation. When the pilot is live: migrate to Ontheia + Privacy Router + vLLM in your own data center. Once RTX Spark ships: hardware refresh – then local-AI-first becomes feasible for 50- to 200-person teams without latency compromises.
The full market overview lives in the master article: The best OpenClaw alternatives 2026. Hands-on step-by-step in the self-hosting GDPR guide.
More on this topic: Coding-Agent Layer · Multi-Agent Layer · Enterprise Gateway Layer · Privacy Router Guide · Master article


