Enterprise Gateway Layer 2026: LiteLLM, Portkey, Cloudflare, Kong, AWS Strands & Privacy Router

    Enterprise Gateway Layer 2026: LiteLLM, Portkey, Cloudflare, Kong, AWS Strands & Privacy Router

    4. Juni 202611 min readDeep Dive
    Till Freitag

    TL;DR: „To put an enterprise gateway in front of your agents, combine LiteLLM (multi-provider) + Portkey (governance) + Privacy Router (GDPR routing) today. Cloudflare is the fastest edge start, Kong the choice for regulated sectors, AWS Strands for pure AWS stacks."

    — Till Freitag

    Why an enterprise gateway in the first place?

    Once more than one team uses LLMs in production, you need four things in one place:

    1. Auth & RBAC – who may use which model with which budget?
    2. Observability & logging – who asked what, when, what did it cost, what came back?
    3. Routing by model/vendor – failover, cost optimization, PII awareness.
    4. Rate limiting & quotas – per team, per model, per time of day.

    Microsoft Scout was announced as an integrated enterprise gateway – but is not yet available. To go live in mid-2026, combine the following building blocks.

    Decision flowchart: which gateway fits you?

    Start: Do you need an LLM gateway?
      │
      ├─ GDPR-strict / on-premise mandate?
      │     │
      │     ├─ Yes →  Self-hosted OpenClaw + Privacy Router
      │     │         (local models, no hyperscaler)
      │     │
      │     └─ No  → continue ↓
      │
      ├─ Pure AWS stack with compliance mandate?
      │     │
      │     ├─ Yes →  AWS Strands / Bedrock AgentCore
      │     │         (IAM, CloudTrail, Bedrock models)
      │     │
      │     └─ No  → continue ↓
      │
      ├─ Regulated industry (bank, pharma, public sector)?
      │     │
      │     ├─ Yes →  Kong AI Gateway (self-hosted or Konnect EU)
      │     │         (mTLS, OAuth/OIDC, audit trails, plugin ecosystem)
      │     │
      │     └─ No  → continue ↓
      │
      ├─ Need PII redaction & prompt governance?
      │     │
      │     ├─ Yes →  Portkey AI Gateway (in front of LiteLLM)
      │     │         (guardrails, prompt versioning, A/B tests)
      │     │
      │     └─ No  → continue ↓
      │
      ├─ High volumes with cache potential & global edge?
      │     │
      │     ├─ Yes →  Cloudflare AI Gateway
      │     │         (DNS entry, 5 min, instant logs & cost caps)
      │     │
      │     └─ No  → continue ↓
      │
      └─ Default: multi-provider with quotas & spend tracking
            │
            └─→  LiteLLM Proxy (+ optional Portkey for governance)
                 (OpenAI-compatible, 100+ providers, Docker in 10 min)

    Deployment decision: self-hosted vs. VPC vs. managed vs. hybrid vs. air-gapped

    Before you pick a product, decide on the deployment model. It drives data residency, governance effort and operational complexity more than the feature list.

    Start: Where are prompts & logs allowed to live?
      │
      ├─ No data may leave the data center (public sector, hospital, defense)?
      │     │
      │     └─ Yes → Air-gapped (on-prem, no internet)
      │             Candidates: OpenClaw + Ollama/vLLM, Kong AI Gateway, LiteLLM
      │             Ops: high (manual updates, own monitoring)
      │
      ├─ Data must stay in your own cloud tenant (bank, insurance, pharma)?
      │     │
      │     └─ Yes → VPC / private cloud (EU region, customer-managed keys)
      │             Candidates: AWS Strands/Bedrock AgentCore, Kong (self-hosted in VPC),
      │                         LiteLLM/Portkey in your own EKS/AKS/GKE
      │             Ops: medium (hyperscaler handles infra)
      │
      ├─ GDPR-compliant, but a mix of sensitive & generic prompts?
      │     │
      │     └─ Yes → Hybrid (managed control plane + self-hosted data plane)
      │             Candidates: Portkey Hybrid, LiteLLM + Privacy Router,
      │                         Cloudflare AI Gateway with EU R2 + local fallback
      │             Ops: medium (two layers, clear routing policies required)
      │
      ├─ Standard SaaS, EU hosting is enough, time-to-value matters?
      │     │
      │     └─ Yes → Managed (SaaS / edge)
      │             Candidates: Portkey Cloud (EU), Cloudflare AI Gateway,
      │                         AWS Bedrock (Frankfurt)
      │             Ops: low (DPA + config, no infra)
      │
      └─ Full control, IaC pipelines, in-house SRE team?
            │
            └─ Yes → Self-hosted (Docker/K8s in your own cluster)
                    Candidates: LiteLLM, Portkey OSS, Kong, OpenClaw + Ollama
                    Ops: high (updates, HA, secrets rotation in your hands)

    Criteria matrix:

    Model Data residency Governance Operational complexity Time to value
    Air-gapped 100% on-prem, no internet Maximum (no third-country transfer, no DPA needed) Very high (updates, monitoring, HA on you) Weeks
    Self-hosted Own cluster (EU/on-prem) High (full auditability, own keys) High (SRE team, IaC, patch management) Days
    VPC / private cloud Own hyperscaler tenant (EU region) High (CMK, IAM, CloudTrail/audit logs) Medium (infra from hyperscaler, config on you) Days
    Hybrid Sensitive paths local, rest managed High (routing policies + audit on both layers) Medium (two layers, clear classification required) Days to weeks
    Managed (SaaS/edge) Vendor region (EU selectable) Medium (DPA + vendor certifications) Low (only config & keys) Hours

    💡 Rule of thumb: The stricter the data residency, the higher the operational complexity. Hybrid is the pragmatic middle ground when not all prompts are equally sensitive – PII/secrets local, generic in the cloud.

    💡 Till Freitag stack recommendation: LiteLLM as multi-provider front door + Portkey as governance layer + Privacy Router for GDPR-critical paths. Once Microsoft Scout is GA, this config can be migrated with manageable effort – skills and MCP configs stay the same.

    The six alternatives in detail – with enterprise workflows

    LiteLLM Proxy – The OpenAI-compatible multi-provider front door

    Setup: ~10 min (docker run litellm/litellm or pip install litellm). Hosting: self-hosted, EU hosting possible. 100+ LLMs behind a unified OpenAI API.

    Concrete enterprise workflows:

    • RBAC / Auth: virtual API keys per team with JWT validation. master_key → mints team_keys with their own model whitelists. SSO via OIDC (Okta, Entra ID) through a reverse proxy.
    • Logging / Observability: OTLP export to Langfuse, Grafana Loki or Datadog. Every request tagged with user_id, team_id, input/output tokens, cost, latency.
    • Routing by model/vendor: model aliases (gpt-4 → primary Azure OpenAI Frankfurt, fallback OpenAI US). Cost-based routing via model_list price info. Health checks every 60s.
    • Rate limiting: quotas per key (rpm, tpm, max_budget_usd_per_month). Soft & hard caps with alert webhooks at 80% spend.

    Portkey AI Gateway – The governance layer with guardrails

    Setup: ~15 min (Docker or cloud). Hosting: self-hosted (OSS) or EU cloud. Sits typically in front of LiteLLM or directly in front of the provider.

    Concrete enterprise workflows:

    • RBAC / Auth: workspaces per department, RBAC with admin / developer / viewer roles. Virtual keys with per-key guardrail configs.
    • Logging / Observability: built-in tracing dashboard with prompt diffs, cost attribution, PII hit rate. OTLP export for external stacks.
    • Routing by model/vendor: strategies for loadbalance, fallback, conditional routing (e.g., "PII detected → on-prem Ollama"), guardrails as pre/post filters (toxicity, PII, JSON-schema validation).
    • Rate limiting: per virtual key, per model, per time of day. Budget caps with auto-disable.

    Cloudflare AI Gateway – The managed edge gateway

    Setup: ~5 min (DNS entry or Worker binding). Hosting: managed, Cloudflare edge (EU PoPs available).

    Concrete enterprise workflows:

    • RBAC / Auth: Cloudflare Access (Zero Trust) as an IdP layer – employees authenticate against Entra ID / Okta before reaching the gateway. API tokens with scopes per service.
    • Logging / Observability: built-in analytics console (requests, cache hit rate, tokens, cost). Logs to R2 / Logpush in EU region (S3, Splunk, BigQuery).
    • Routing by model/vendor: multi-provider failover (e.g., Anthropic primary, OpenAI fallback). Caching on prompt hash saves up to 60% on recurring queries (marketing tools, classifiers).
    • Rate limiting: cost caps per token, requests/min per account or per user header. Edge-near limits keep providers from being contacted at all.

    Kong AI Gateway – The classic API gateway with AI plugins

    Setup: ~30 min (Helm / Docker). Hosting: self-hosted or Kong Konnect EU.

    Concrete enterprise workflows:

    • RBAC / Auth: mTLS between services, OAuth 2.0 / OIDC to Entra ID, Keycloak, Okta. Consumer model with ACLs per route – ideal for multi-tenant platforms.
    • Logging / Observability: plugins for OTLP, Prometheus, Datadog, Elastic. Audit trails on every route, request/response bodies optionally encrypted into the SIEM.
    • Routing by model/vendor: AI-Proxy plugin speaks Anthropic, OpenAI, Cohere, Mistral, Azure OpenAI. AI-Request-Transformer for prompt manipulation, AI-Response-Transformer for schema enforcement.
    • Rate limiting: Rate-Limiting-Advanced plugin (sliding window, redis-backed) per consumer, per route, per plan tier. AI-specific: tokens/min instead of just requests/min.

    AWS Strands / Bedrock AgentCore – The AWS-native stack

    Setup: ~30 min (AWS CLI + IAM + Bedrock console). Hosting: AWS cloud, Frankfurt region.

    Concrete enterprise workflows:

    • RBAC / Auth: IAM roles per Lambda/container, fine-grained per Bedrock model and per skill. SSO via IAM Identity Center, permission sets per department.
    • Logging / Observability: CloudTrail for every Bedrock API call (compliance audit trail), CloudWatch Logs Insights for queries, X-Ray for tracing. Models have built-in invocation logs to S3/CloudWatch.
    • Routing by model/vendor: inference profiles in Bedrock allow cross-region routing and model aliases. Bedrock Guardrails as a central PII/toxicity layer. Anthropic models, Llama, Mistral, Amazon Nova natively.
    • Rate limiting: service quotas per model and region. Per-application Provisioned Throughput for predictable latency. Budgets via AWS Cost Anomaly Detection with auto-alerts.

    Self-hosted OpenClaw + Privacy Router – The DIY enterprise gateway

    Setup: ~30 min (Docker Compose + Ollama). Hosting: on-premise, no data leakage. Detailed guide: self-hosting GDPR.

    Concrete enterprise workflows:

    • RBAC / Auth: reverse proxy (Traefik, Nginx, Authentik) with OIDC against your own Entra ID. Per-team configs as YAML, skill whitelists per team.
    • Logging / Observability: OpenTelemetry collector → Loki/Grafana or Elastic. Privacy Router logs the classification decision per request (local vs. cloud) – audit trail for the DPO.
    • Routing by model/vendor: Privacy Router decides per prompt: sensitive → local (Ollama, vLLM), generic → cheap cloud (Haiku, Mini), complex without PII → top cloud (Sonnet, GPT). Rules as YAML + ML classifier.
    • Rate limiting: Nginx or Traefik middlewares with per-team limits, token quotas via LiteLLM behind it (stacks compose well).

    Comparison matrix: privacy, compliance, latency & deployment

    Side-by-side overview of all six enterprise gateway options across the dimensions that matter most for procurement, security and platform teams.

    Gateway Privacy Compliance Latency Deployment models
    LiteLLM Proxy High – self-hosted, data passes only through your proxy SOC2-ready when self-hosted; logs/quotas configurable; no built-in PII redaction Low (added hop ~5–20 ms) Docker, Kubernetes/Helm, bare-metal, any cloud
    Portkey AI Gateway High self-hosted / Medium SaaS – built-in PII redaction & guardrails SOC2 (SaaS), GDPR-friendly self-hosted; prompt versioning & audit logs Low–Medium (10–30 ms incl. guardrails) SaaS, Docker, Kubernetes, hybrid
    Cloudflare AI Gateway Medium – managed edge, metadata stays at Cloudflare SOC2, ISO 27001, GDPR DPA; no on-prem option Very low (edge-routed, <10 ms overhead) Managed SaaS only (Cloudflare edge)
    Kong AI Gateway High – fully self-hosted, mTLS end-to-end SOC2, HIPAA, PCI, FedRAMP-ready; plugin-based audit trails Low (5–15 ms) Docker, Kubernetes/Helm, VM, on-prem, hybrid
    AWS Strands / Bedrock AgentCore High within AWS – data stays in your AWS account/region SOC2, ISO 27001, HIPAA, FedRAMP, EU-region pinning via Bedrock Low in-region (5–15 ms) AWS-managed only (Bedrock + IAM)
    Self-Hosted OpenClaw + Privacy Router Maximum – on-premise, sensitive prompts never leave the network Full GDPR/Schrems II control, BYO audit log, no third-country transfer Variable – local LLM latency depends on hardware (GPU recommended) Docker Compose, Kubernetes, on-prem, air-gapped

    How to read it: "Privacy" = where prompt/response data physically lives. "Compliance" = certifications and controls available out of the box. "Latency" = added gateway overhead, not the underlying model. "Deployment" = where you can actually run it today.

    Real-world deployment examples

    Short, practical scenarios – which deployment model fits which company type?

    Gateway Deployment example Typical setup
    LiteLLM Proxy Self-hosted: Tech scale-up with 8 dev teams, each gets a virtual API key. LiteLLM runs on a dedicated Kubernetes cluster in Hetzner Frankfurt. No data leakage, SOC2-ready through own audit logs. Kubernetes/Helm, own cloud, 2–3 replicas
    Portkey AI Gateway Hybrid: Mid-sized industrial company uses Portkey SaaS for prompt governance (versioning, guardrails), but routes GDPR-critical paths (customer data) through the self-hosted Portkey agent internally. SaaS + Docker agent on-premise, separate workspaces
    Cloudflare AI Gateway Managed: E-commerce startup with global traffic. DNS entry to Cloudflare, AI Gateway in front of all provider APIs. No own Kubernetes needed, logs automatically flow into R2 (EU region). Pure SaaS/edge deployment, no own infrastructure
    Kong AI Gateway VPC / On-premise: Bank with regulatory mTLS mandate. Kong Konnect EU in isolated VPC, end-to-end encryption, plugin-based audit trails. No data leakage to the public internet for sensitive transaction data. Kong Konnect EU or self-hosted Kubernetes, air-gapped option
    AWS Strands / Bedrock AgentCore Managed (AWS-only): Fintech already fully on AWS (IAM, CloudTrail, Cost Explorer). Bedrock inference profiles in Frankfurt, no third-country transfers. Provisioned throughput for predictable latency in payment processing. AWS-managed, Frankfurt region, IAM Identity Center
    Self-Hosted OpenClaw + Privacy Router Air-gapped / On-premise: Hospital with absolute offline mandate. OpenClaw + Ollama on internal servers, Privacy Router classifies every prompt locally (no cloud model ever involved). GDPR compliance without a data processing agreement. Docker Compose, internal network, no internet connection needed

    Quick-Select: which enterprise gateway for which profile?

    Profile Recommendation Why
    Fastest start Cloudflare AI Gateway DNS entry in 5 minutes, instant logs & cost caps
    Highest privacy control Self-hosted OpenClaw + Privacy Router Fully on-premise, sensitivity-aware routing
    Best overall package LiteLLM Proxy (+ optional Portkey) OpenAI-compatible, 100+ providers, quotas, spend tracking
    Regulated industry Kong AI Gateway mTLS, OAuth/OIDC, audit trails, plugin ecosystem
    AWS-only AWS Strands / Bedrock AgentCore IAM, CloudTrail, Bedrock inference profiles

    Migration path to Microsoft Scout (once GA)

    Once Microsoft Scout ships, it typically won't replace all of the above. Realistic picture:

    • LiteLLM → Scout: if Microsoft delivers on its multi-provider claim (open question), LiteLLM can be replaced for Azure-first shops.
    • Portkey stays useful if you want cross-provider governance.
    • Privacy Router stays essential – Scout is Azure-native and doesn't solve on-premise data routing.
    • Kong & AWS Strands stay, since they cover specific requirements (mTLS, AWS compliance) Scout doesn't replace.

    Till Freitag recommendation

    For 80% of enterprises today: LiteLLM (multi-provider) + Portkey (governance) + Privacy Router (GDPR routing) – fully open, productive in 1–2 days, migratable to Scout. AWS-only shops take Strands / AgentCore. Regulated industries with mTLS mandate: Kong AI Gateway. Want logs in 5 minutes: Cloudflare AI Gateway as edge front.

    The full market overview lives in the master article: The best OpenClaw alternatives 2026.

    More on this topic: Coding-Agent Layer · Multi-Agent Layer · Self-Hosted & Privacy Layer · Microsoft Scout as OpenClaw gateway · Privacy Router Guide · Master article

    TeilenLinkedInWhatsAppE-Mail

    Related Articles

    Self-Hosted & Privacy Layer 2026: Ontheia, Anything LLM & Privacy Router
    June 4, 20264 min

    Self-Hosted & Privacy Layer 2026: Ontheia, Anything LLM & Privacy Router

    If you take GDPR seriously, there's no way around self-hosting. Ontheia, Anything LLM, NanoClaw and the Privacy Router c…

    Read more
    The Best OpenClaw Alternatives 2026 – from NanoClaw to NullClawDeep Dive
    February 21, 202622 min

    The Best OpenClaw Alternatives 2026 – from NanoClaw to NullClaw

    OpenClaw has 200,000+ GitHub stars – but not everyone needs 430,000 lines of code. We compare 22 alternatives in mid-202…

    Read more
    Multi-Agent Layer 2026: AG2, LangGraph, SuperAGI & AWS Strands Compared
    June 4, 20264 min

    Multi-Agent Layer 2026: AG2, LangGraph, SuperAGI & AWS Strands Compared

    When one agent isn't enough: AG2, LangGraph, SuperAGI and AWS Strands compared. Which multi-agent stack fits which workf…

    Read more
    Claude Code vs OpenClaw – coding assistant compared to enterprise agent infrastructure
    April 28, 20263 min

    „Claude Code Killed OpenClaw" – Why That Comparison Makes No Sense

    People on LinkedIn keep saying „Claude Code killed OpenClaw." That's like comparing apples with interstellar spaceships.…

    Read more
    OpenClaw Self-Hosting Guide: GDPR-Compliant in 30 Minutes
    February 28, 20264 min

    OpenClaw Self-Hosting Guide: GDPR-Compliant in 30 Minutes

    Self-host OpenClaw with Docker, persistent storage, and local LLMs via Ollama – fully GDPR-compliant because no data eve…

    Read more
    June 3, 20265 min

    Microsoft Scout Runs on OpenClaw – Not „OpenClaw-like", It Is the Gateway

    Microsoft just showed off a new Copilot agent called Scout at Build. The tech bubble calls it „OpenClaw-like" – in reali…

    Read more
    June 3, 20265 min

    NVIDIA RTX Spark: When the Laptop Becomes the AI Cloud – Local AI First Gets Real

    DGX Spark was the prelude, RTX Spark is the rollout. Why NVIDIA's RTX Spark platform flips the cloud-default assumption …

    Read more
    Coding-Agent Layer 2026: OpenCode, Aider, Continue.dev & Co. Compared
    June 4, 20264 min

    Coding-Agent Layer 2026: OpenCode, Aider, Continue.dev & Co. Compared

    Deep dive into the coding-agent layer: which OpenClaw coding rival fits which workflow? OpenCode, Aider, Continue.dev, S…

    Read more
    Paperclip control plane showing an org chart of AI agents with CEO, managers, workers, approval gates and budget tracking
    April 28, 20266 min

    Paperclip: If OpenClaw Is the Employee, Paperclip Is the Company

    Paperclip is open-source infrastructure to run an entire AI-only company – org chart, budgets, approvals, audit trail. W…

    Read more