
Enterprise Gateway Layer 2026: LiteLLM, Portkey, Cloudflare, Kong, AWS Strands & Privacy Router
TL;DR: „To put an enterprise gateway in front of your agents, combine LiteLLM (multi-provider) + Portkey (governance) + Privacy Router (GDPR routing) today. Cloudflare is the fastest edge start, Kong the choice for regulated sectors, AWS Strands for pure AWS stacks."
— Till FreitagWhy an enterprise gateway in the first place?
Once more than one team uses LLMs in production, you need four things in one place:
- Auth & RBAC – who may use which model with which budget?
- Observability & logging – who asked what, when, what did it cost, what came back?
- Routing by model/vendor – failover, cost optimization, PII awareness.
- Rate limiting & quotas – per team, per model, per time of day.
Microsoft Scout was announced as an integrated enterprise gateway – but is not yet available. To go live in mid-2026, combine the following building blocks.
Decision flowchart: which gateway fits you?
Start: Do you need an LLM gateway?
│
├─ GDPR-strict / on-premise mandate?
│ │
│ ├─ Yes → Self-hosted OpenClaw + Privacy Router
│ │ (local models, no hyperscaler)
│ │
│ └─ No → continue ↓
│
├─ Pure AWS stack with compliance mandate?
│ │
│ ├─ Yes → AWS Strands / Bedrock AgentCore
│ │ (IAM, CloudTrail, Bedrock models)
│ │
│ └─ No → continue ↓
│
├─ Regulated industry (bank, pharma, public sector)?
│ │
│ ├─ Yes → Kong AI Gateway (self-hosted or Konnect EU)
│ │ (mTLS, OAuth/OIDC, audit trails, plugin ecosystem)
│ │
│ └─ No → continue ↓
│
├─ Need PII redaction & prompt governance?
│ │
│ ├─ Yes → Portkey AI Gateway (in front of LiteLLM)
│ │ (guardrails, prompt versioning, A/B tests)
│ │
│ └─ No → continue ↓
│
├─ High volumes with cache potential & global edge?
│ │
│ ├─ Yes → Cloudflare AI Gateway
│ │ (DNS entry, 5 min, instant logs & cost caps)
│ │
│ └─ No → continue ↓
│
└─ Default: multi-provider with quotas & spend tracking
│
└─→ LiteLLM Proxy (+ optional Portkey for governance)
(OpenAI-compatible, 100+ providers, Docker in 10 min)Deployment decision: self-hosted vs. VPC vs. managed vs. hybrid vs. air-gapped
Before you pick a product, decide on the deployment model. It drives data residency, governance effort and operational complexity more than the feature list.
Start: Where are prompts & logs allowed to live?
│
├─ No data may leave the data center (public sector, hospital, defense)?
│ │
│ └─ Yes → Air-gapped (on-prem, no internet)
│ Candidates: OpenClaw + Ollama/vLLM, Kong AI Gateway, LiteLLM
│ Ops: high (manual updates, own monitoring)
│
├─ Data must stay in your own cloud tenant (bank, insurance, pharma)?
│ │
│ └─ Yes → VPC / private cloud (EU region, customer-managed keys)
│ Candidates: AWS Strands/Bedrock AgentCore, Kong (self-hosted in VPC),
│ LiteLLM/Portkey in your own EKS/AKS/GKE
│ Ops: medium (hyperscaler handles infra)
│
├─ GDPR-compliant, but a mix of sensitive & generic prompts?
│ │
│ └─ Yes → Hybrid (managed control plane + self-hosted data plane)
│ Candidates: Portkey Hybrid, LiteLLM + Privacy Router,
│ Cloudflare AI Gateway with EU R2 + local fallback
│ Ops: medium (two layers, clear routing policies required)
│
├─ Standard SaaS, EU hosting is enough, time-to-value matters?
│ │
│ └─ Yes → Managed (SaaS / edge)
│ Candidates: Portkey Cloud (EU), Cloudflare AI Gateway,
│ AWS Bedrock (Frankfurt)
│ Ops: low (DPA + config, no infra)
│
└─ Full control, IaC pipelines, in-house SRE team?
│
└─ Yes → Self-hosted (Docker/K8s in your own cluster)
Candidates: LiteLLM, Portkey OSS, Kong, OpenClaw + Ollama
Ops: high (updates, HA, secrets rotation in your hands)Criteria matrix:
| Model | Data residency | Governance | Operational complexity | Time to value |
|---|---|---|---|---|
| Air-gapped | 100% on-prem, no internet | Maximum (no third-country transfer, no DPA needed) | Very high (updates, monitoring, HA on you) | Weeks |
| Self-hosted | Own cluster (EU/on-prem) | High (full auditability, own keys) | High (SRE team, IaC, patch management) | Days |
| VPC / private cloud | Own hyperscaler tenant (EU region) | High (CMK, IAM, CloudTrail/audit logs) | Medium (infra from hyperscaler, config on you) | Days |
| Hybrid | Sensitive paths local, rest managed | High (routing policies + audit on both layers) | Medium (two layers, clear classification required) | Days to weeks |
| Managed (SaaS/edge) | Vendor region (EU selectable) | Medium (DPA + vendor certifications) | Low (only config & keys) | Hours |
💡 Rule of thumb: The stricter the data residency, the higher the operational complexity. Hybrid is the pragmatic middle ground when not all prompts are equally sensitive – PII/secrets local, generic in the cloud.
💡 Till Freitag stack recommendation: LiteLLM as multi-provider front door + Portkey as governance layer + Privacy Router for GDPR-critical paths. Once Microsoft Scout is GA, this config can be migrated with manageable effort – skills and MCP configs stay the same.
The six alternatives in detail – with enterprise workflows
LiteLLM Proxy – The OpenAI-compatible multi-provider front door
Setup: ~10 min (docker run litellm/litellm or pip install litellm). Hosting: self-hosted, EU hosting possible. 100+ LLMs behind a unified OpenAI API.
Concrete enterprise workflows:
- RBAC / Auth: virtual API keys per team with JWT validation.
master_key→ mintsteam_keyswith their own model whitelists. SSO via OIDC (Okta, Entra ID) through a reverse proxy. - Logging / Observability: OTLP export to Langfuse, Grafana Loki or Datadog. Every request tagged with
user_id,team_id, input/output tokens, cost, latency. - Routing by model/vendor: model aliases (
gpt-4→ primary Azure OpenAI Frankfurt, fallback OpenAI US). Cost-based routing viamodel_listprice info. Health checks every 60s. - Rate limiting: quotas per key (
rpm,tpm,max_budget_usd_per_month). Soft & hard caps with alert webhooks at 80% spend.
Portkey AI Gateway – The governance layer with guardrails
Setup: ~15 min (Docker or cloud). Hosting: self-hosted (OSS) or EU cloud. Sits typically in front of LiteLLM or directly in front of the provider.
Concrete enterprise workflows:
- RBAC / Auth: workspaces per department, RBAC with admin / developer / viewer roles. Virtual keys with per-key guardrail configs.
- Logging / Observability: built-in tracing dashboard with prompt diffs, cost attribution, PII hit rate. OTLP export for external stacks.
- Routing by model/vendor: strategies for loadbalance, fallback, conditional routing (e.g., "PII detected → on-prem Ollama"), guardrails as pre/post filters (toxicity, PII, JSON-schema validation).
- Rate limiting: per virtual key, per model, per time of day. Budget caps with auto-disable.
Cloudflare AI Gateway – The managed edge gateway
Setup: ~5 min (DNS entry or Worker binding). Hosting: managed, Cloudflare edge (EU PoPs available).
Concrete enterprise workflows:
- RBAC / Auth: Cloudflare Access (Zero Trust) as an IdP layer – employees authenticate against Entra ID / Okta before reaching the gateway. API tokens with scopes per service.
- Logging / Observability: built-in analytics console (requests, cache hit rate, tokens, cost). Logs to R2 / Logpush in EU region (S3, Splunk, BigQuery).
- Routing by model/vendor: multi-provider failover (e.g., Anthropic primary, OpenAI fallback). Caching on prompt hash saves up to 60% on recurring queries (marketing tools, classifiers).
- Rate limiting: cost caps per token, requests/min per account or per user header. Edge-near limits keep providers from being contacted at all.
Kong AI Gateway – The classic API gateway with AI plugins
Setup: ~30 min (Helm / Docker). Hosting: self-hosted or Kong Konnect EU.
Concrete enterprise workflows:
- RBAC / Auth: mTLS between services, OAuth 2.0 / OIDC to Entra ID, Keycloak, Okta. Consumer model with ACLs per route – ideal for multi-tenant platforms.
- Logging / Observability: plugins for OTLP, Prometheus, Datadog, Elastic. Audit trails on every route, request/response bodies optionally encrypted into the SIEM.
- Routing by model/vendor: AI-Proxy plugin speaks Anthropic, OpenAI, Cohere, Mistral, Azure OpenAI. AI-Request-Transformer for prompt manipulation, AI-Response-Transformer for schema enforcement.
- Rate limiting: Rate-Limiting-Advanced plugin (sliding window, redis-backed) per consumer, per route, per plan tier. AI-specific: tokens/min instead of just requests/min.
AWS Strands / Bedrock AgentCore – The AWS-native stack
Setup: ~30 min (AWS CLI + IAM + Bedrock console). Hosting: AWS cloud, Frankfurt region.
Concrete enterprise workflows:
- RBAC / Auth: IAM roles per Lambda/container, fine-grained per Bedrock model and per skill. SSO via IAM Identity Center, permission sets per department.
- Logging / Observability: CloudTrail for every Bedrock API call (compliance audit trail), CloudWatch Logs Insights for queries, X-Ray for tracing. Models have built-in invocation logs to S3/CloudWatch.
- Routing by model/vendor: inference profiles in Bedrock allow cross-region routing and model aliases. Bedrock Guardrails as a central PII/toxicity layer. Anthropic models, Llama, Mistral, Amazon Nova natively.
- Rate limiting: service quotas per model and region. Per-application Provisioned Throughput for predictable latency. Budgets via AWS Cost Anomaly Detection with auto-alerts.
Self-hosted OpenClaw + Privacy Router – The DIY enterprise gateway
Setup: ~30 min (Docker Compose + Ollama). Hosting: on-premise, no data leakage. Detailed guide: self-hosting GDPR.
Concrete enterprise workflows:
- RBAC / Auth: reverse proxy (Traefik, Nginx, Authentik) with OIDC against your own Entra ID. Per-team configs as YAML, skill whitelists per team.
- Logging / Observability: OpenTelemetry collector → Loki/Grafana or Elastic. Privacy Router logs the classification decision per request (local vs. cloud) – audit trail for the DPO.
- Routing by model/vendor: Privacy Router decides per prompt: sensitive → local (Ollama, vLLM), generic → cheap cloud (Haiku, Mini), complex without PII → top cloud (Sonnet, GPT). Rules as YAML + ML classifier.
- Rate limiting: Nginx or Traefik middlewares with per-team limits, token quotas via LiteLLM behind it (stacks compose well).
Comparison matrix: privacy, compliance, latency & deployment
Side-by-side overview of all six enterprise gateway options across the dimensions that matter most for procurement, security and platform teams.
| Gateway | Privacy | Compliance | Latency | Deployment models |
|---|---|---|---|---|
| LiteLLM Proxy | High – self-hosted, data passes only through your proxy | SOC2-ready when self-hosted; logs/quotas configurable; no built-in PII redaction | Low (added hop ~5–20 ms) | Docker, Kubernetes/Helm, bare-metal, any cloud |
| Portkey AI Gateway | High self-hosted / Medium SaaS – built-in PII redaction & guardrails | SOC2 (SaaS), GDPR-friendly self-hosted; prompt versioning & audit logs | Low–Medium (10–30 ms incl. guardrails) | SaaS, Docker, Kubernetes, hybrid |
| Cloudflare AI Gateway | Medium – managed edge, metadata stays at Cloudflare | SOC2, ISO 27001, GDPR DPA; no on-prem option | Very low (edge-routed, <10 ms overhead) | Managed SaaS only (Cloudflare edge) |
| Kong AI Gateway | High – fully self-hosted, mTLS end-to-end | SOC2, HIPAA, PCI, FedRAMP-ready; plugin-based audit trails | Low (5–15 ms) | Docker, Kubernetes/Helm, VM, on-prem, hybrid |
| AWS Strands / Bedrock AgentCore | High within AWS – data stays in your AWS account/region | SOC2, ISO 27001, HIPAA, FedRAMP, EU-region pinning via Bedrock | Low in-region (5–15 ms) | AWS-managed only (Bedrock + IAM) |
| Self-Hosted OpenClaw + Privacy Router | Maximum – on-premise, sensitive prompts never leave the network | Full GDPR/Schrems II control, BYO audit log, no third-country transfer | Variable – local LLM latency depends on hardware (GPU recommended) | Docker Compose, Kubernetes, on-prem, air-gapped |
How to read it: "Privacy" = where prompt/response data physically lives. "Compliance" = certifications and controls available out of the box. "Latency" = added gateway overhead, not the underlying model. "Deployment" = where you can actually run it today.
Real-world deployment examples
Short, practical scenarios – which deployment model fits which company type?
| Gateway | Deployment example | Typical setup |
|---|---|---|
| LiteLLM Proxy | Self-hosted: Tech scale-up with 8 dev teams, each gets a virtual API key. LiteLLM runs on a dedicated Kubernetes cluster in Hetzner Frankfurt. No data leakage, SOC2-ready through own audit logs. | Kubernetes/Helm, own cloud, 2–3 replicas |
| Portkey AI Gateway | Hybrid: Mid-sized industrial company uses Portkey SaaS for prompt governance (versioning, guardrails), but routes GDPR-critical paths (customer data) through the self-hosted Portkey agent internally. | SaaS + Docker agent on-premise, separate workspaces |
| Cloudflare AI Gateway | Managed: E-commerce startup with global traffic. DNS entry to Cloudflare, AI Gateway in front of all provider APIs. No own Kubernetes needed, logs automatically flow into R2 (EU region). | Pure SaaS/edge deployment, no own infrastructure |
| Kong AI Gateway | VPC / On-premise: Bank with regulatory mTLS mandate. Kong Konnect EU in isolated VPC, end-to-end encryption, plugin-based audit trails. No data leakage to the public internet for sensitive transaction data. | Kong Konnect EU or self-hosted Kubernetes, air-gapped option |
| AWS Strands / Bedrock AgentCore | Managed (AWS-only): Fintech already fully on AWS (IAM, CloudTrail, Cost Explorer). Bedrock inference profiles in Frankfurt, no third-country transfers. Provisioned throughput for predictable latency in payment processing. | AWS-managed, Frankfurt region, IAM Identity Center |
| Self-Hosted OpenClaw + Privacy Router | Air-gapped / On-premise: Hospital with absolute offline mandate. OpenClaw + Ollama on internal servers, Privacy Router classifies every prompt locally (no cloud model ever involved). GDPR compliance without a data processing agreement. | Docker Compose, internal network, no internet connection needed |
Quick-Select: which enterprise gateway for which profile?
| Profile | Recommendation | Why |
|---|---|---|
| Fastest start | Cloudflare AI Gateway | DNS entry in 5 minutes, instant logs & cost caps |
| Highest privacy control | Self-hosted OpenClaw + Privacy Router | Fully on-premise, sensitivity-aware routing |
| Best overall package | LiteLLM Proxy (+ optional Portkey) | OpenAI-compatible, 100+ providers, quotas, spend tracking |
| Regulated industry | Kong AI Gateway | mTLS, OAuth/OIDC, audit trails, plugin ecosystem |
| AWS-only | AWS Strands / Bedrock AgentCore | IAM, CloudTrail, Bedrock inference profiles |
Migration path to Microsoft Scout (once GA)
Once Microsoft Scout ships, it typically won't replace all of the above. Realistic picture:
- LiteLLM → Scout: if Microsoft delivers on its multi-provider claim (open question), LiteLLM can be replaced for Azure-first shops.
- Portkey stays useful if you want cross-provider governance.
- Privacy Router stays essential – Scout is Azure-native and doesn't solve on-premise data routing.
- Kong & AWS Strands stay, since they cover specific requirements (mTLS, AWS compliance) Scout doesn't replace.
Till Freitag recommendation
For 80% of enterprises today: LiteLLM (multi-provider) + Portkey (governance) + Privacy Router (GDPR routing) – fully open, productive in 1–2 days, migratable to Scout. AWS-only shops take Strands / AgentCore. Regulated industries with mTLS mandate: Kong AI Gateway. Want logs in 5 minutes: Cloudflare AI Gateway as edge front.
The full market overview lives in the master article: The best OpenClaw alternatives 2026.
More on this topic: Coding-Agent Layer · Multi-Agent Layer · Self-Hosted & Privacy Layer · Microsoft Scout as OpenClaw gateway · Privacy Router Guide · Master article


