Kong AI Gateway + OpenWebUI
This showcase demonstrates how to use Kong AI Gateway as a centralized proxy for LLM access, combined with OpenWebUI as a chat frontend. Kong handles authentication, authorization, and (with Enterprise license) rate limiting and prompt guardrails — so individual users never need direct access to LLM provider API keys.
Architecture
┌─────────────────┐
┌───▶│ Ollama (local) │ Default, free
│ │ - llama3.2:1b │ Chat model
┌────────────┐ ┌──────────────────────┐ │ │ - nomic-embed │ Embeddings (RAG)
│ Browser │──▶ │ Kong AI Gateway │──┤ └─────────────────┘
│ (OpenWebUI)│ │ │ │ ┌─────────────────┐
└────────────┘ │ - ai-proxy plugin │ ├───▶│ Google Gemini │ Optional, free tier
│ - key-auth plugin │ │ └─────────────────┘
│ - http-log (metrics)│ │ ┌─────────────────┐
│ - rate-limit (Ent.) │ └───▶│ Anthropic API │ Optional, paid
│ - prompt-guard (E.) │ └─────────────────┘
└──────────────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
RAG Pipeline Monitoring Token Metrics
Docs → Embed Prometheus → http-log →
→ ChromaDB → Grafana AI Metrics
Context Dashboard ExporterMultiple local models and cloud providers are supported, each with its own Kong route:
| Model / Provider | Route | Type | Cost |
|---|---|---|---|
| llama3.2:1b (Ollama) | /ollama/v1/chat/completions | General chat | Free (local) |
| qwen2.5-coder:1.5b (Ollama) | /coder/v1/chat/completions | Code generation | Free (local) |
| gemma3:1b (Ollama) | /gemma/v1/chat/completions | Alternative chat | Free (local) |
| Google Gemini (gemini-2.5-flash) | /gemini/v1/chat/completions | Cloud — fast | Free tier / Optional |
| Anthropic Claude (claude-haiku-3-5) | /anthropic/v1/chat/completions | Cloud — quality | Paid / Optional |
Preconditions
- k3d cluster with Kong Ingress Controller deployed (
examples/kong-gateway) helmCLI installed- Optional: Google Gemini API Key (free from aistudio.google.com)
- Optional: Anthropic API Key (paid, from console.anthropic.com)
Getting API Keys
Google Gemini (free)
- Go to aistudio.google.com
- Sign in with your Google account
- Click "Get API key" in the left sidebar
- Click "Create API key" and select a Google Cloud project (or create one)
- Copy the generated key — it starts with
AIza...
The free tier includes 15 requests/minute and 1 million tokens/day — more than enough for this demo.
Anthropic (paid, optional)
- Go to console.anthropic.com
- Create an account and add a payment method
- Navigate to Settings → API keys
- Click "Create Key" and copy it — it starts with
sk-ant-...
Note: Anthropic does not offer a free tier. You can skip this and still use the demo with Ollama and Gemini.
Adding keys after installation
If you skipped the API key prompts during setup.sh, you can add them later:
# Add Gemini key
kubectl create secret generic gemini-api-key \
--namespace kong --from-literal=api-key="YOUR_GEMINI_KEY"
kubectl apply -f kong-ai-plugin-gemini.yaml
kubectl apply -f kong-ai-route-gemini.yaml
# Add Anthropic key
kubectl create secret generic anthropic-api-key \
--namespace kong --from-literal=api-key="YOUR_ANTHROPIC_KEY"
kubectl apply -f kong-ai-plugin-anthropic.yaml
kubectl apply -f kong-ai-route-anthropic.yamlDNS preparation
Add the following entries to /etc/hosts:
127.0.0.1 ai.example.com chat.example.com keycloak.example.com grafana.example.com kong-manager.example.com kong-admin.example.comInstallation
cd examples/kong-ai-gateway
bash setup.shNon-interactive complete setup
To deploy the whole showcase with every optional component (OpenBao, monitoring stack, Kuma mesh) and no prompts, use the wrapper:
cd examples/kong-ai-gateway
bash complete-setup.shIt runs setup.sh with NON_INTERACTIVE=1 and DEPLOY_OPENBAO, DEPLOY_MONITORING, DEPLOY_KUMA all set to y. Enterprise features activate automatically when license.json is present. External providers stay opt-in — export the keys first to wire them in, or turn a component off:
GEMINI_API_KEY=... ANTHROPIC_API_KEY=... bash complete-setup.sh
DEPLOY_KUMA=n bash complete-setup.shThe script will:
- Verify that Kong Ingress Controller is running
- Optionally deploy OpenBao and cert-manager for TLS certificates
- Deploy Ollama with
llama3.2:1b(chat) andnomic-embed-text(RAG embeddings), optionallyqwen2.5-coder:1.5bandgemma3:1b - Optionally prompt for Gemini and Anthropic API keys
- Apply Kong AI plugins per provider (ai-proxy, key-auth, per-user model filtering)
- Deploy three consumers with different access levels (
dev-user,team-lead,admin-user) - Optionally deploy Prometheus + Grafana monitoring with pre-built AI dashboard, token metrics, and cost estimation
- Optionally enable Kuma Service Mesh with mTLS for confidential computing
- Deploy OpenWebUI via Helm (all traffic including RAG embeddings routed through Kong)
- Seed a platform admin account so OIDC users get the
userrole (no settings access) - Set up HTTPRoutes for all endpoints
Kong Enterprise (optional)
Place a license.json file in the examples/kong-ai-gateway/ directory before running setup.sh. This enables:
- Kong Manager UI at
https://kong-manager.example.com:8081 - Keycloak as OpenID Connect identity provider (3 users:
dev,lead,admin) - OIDC Consumer Mapping — JWT
preferred_usernameclaim maps to Kong consumers - AI Gateway Failover —
ai-proxy-advancedwith priority-based failover (Ollama → Gemini → Anthropic) - Dual Authentication — API key OR OIDC token accepted on all AI routes
- AI Rate Limiting Advanced — token-based rate limiting (10,000 tokens/minute per consumer)
- AI Prompt Guard — blocks prompt injection attempts (e.g., "ignore previous instructions")
Note on RBAC: Kong's Role-Based Access Control (RBAC) for the Admin API and Manager UI requires a database-backed deployment. This showcase uses Kong in DB-less mode (via Kong Ingress Controller), which does not support RBAC. In production, consider running Kong in hybrid mode with a PostgreSQL database to enable RBAC, or restrict access to the Admin API via network policies.
Components
| Component | Source | Namespace |
|---|---|---|
| Ollama (llama3.2:1b + nomic-embed-text) | Deployment manifest | ai-platform |
| ChromaDB (RAG vector store) | Built into OpenWebUI | ai-platform |
| OpenWebUI | Helm chart | ai-platform |
| Kong AI Proxy Plugins | KongPlugin CRD | kong |
| Key Authentication | KongPlugin CRD | kong |
| Demo Consumer | KongConsumer CRD | kong |
| AI Metrics Exporter (optional) | Deployment manifest | monitoring |
| Prometheus (optional) | Helm chart | monitoring |
| Grafana (optional) | Helm chart | monitoring |
| Kong Prometheus Plugin (optional) | KongClusterPlugin CRD | cluster-wide |
| Keycloak (Enterprise) | Deployment manifest | ai-platform |
| AI Rate Limiting (Enterprise) | KongPlugin CRD | kong |
| AI Prompt Guard (Enterprise) | KongPlugin CRD | kong |
| Kuma Service Mesh (optional) | Helm chart | kuma-cp |
| mTLS Traffic Policies | MeshTrafficPermission CRD | kuma-cp |
Endpoints
| Service | URL | Auth |
|---|---|---|
| OpenWebUI | https://chat.example.com:8081 | OSS: open / Enterprise: Keycloak OIDC / Admin: admin@ai-platform.local / admin |
| AI Proxy (Ollama) | https://ai.example.com:8081/ollama/v1/chat/completions | key-auth (apikey header) |
| AI Proxy (Gemini) | https://ai.example.com:8081/gemini/v1/chat/completions | key-auth (apikey header) |
| AI Proxy (Anthropic) | https://ai.example.com:8081/anthropic/v1/chat/completions | key-auth (apikey header) |
| AI Failover (Enterprise) | https://ai.example.com:8081/ai/v1/chat/completions | key-auth / OIDC |
| Grafana (optional) | https://grafana.example.com:8081 | admin / admin |
| Kong Manager (Enterprise) | https://kong-manager.example.com:8081 | — |
| Kong Admin API (Enterprise) | https://kong-admin.example.com:8081 | — |
| Keycloak Admin (Enterprise) | https://keycloak.example.com:8081 | admin / admin (Keycloak admin) |
Testing
Direct API call via curl (Ollama — free, always available)
curl -k -H "apikey: dev-key-12345" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello! What is Kubernetes?"}],"model":"llama3.2:1b"}' \
https://ai.example.com:8081/ollama/v1/chat/completionsDirect API call via curl (Gemini — if configured)
# Requires team-lead or admin-user API key (dev-user returns 403)
curl -k -H "apikey: lead-key-12345" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello! What is Kubernetes?"}],"model":"gemini-2.5-flash"}' \
https://ai.example.com:8081/gemini/v1/chat/completionsModel list per consumer
# dev-user sees Ollama models only
curl -k -H "apikey: dev-key-12345" https://ai.example.com:8081/ollama/v1/models
# lead-user sees Ollama + Gemini
curl -k -H "apikey: lead-key-12345" https://ai.example.com:8081/ollama/v1/models
# admin-user sees all models
curl -k -H "apikey: admin-key-12345" https://ai.example.com:8081/ollama/v1/modelsOpenWebUI
- Open
https://chat.example.com:8081in your browser - OSS mode: No login required — open access
- Enterprise mode: Click "Login with Keycloak" (
dev/dev,lead/lead, oradmin/admin) - The model dropdown shows only the models your role is authorized for
- Select a model and start chatting
Verify API authentication
Without a valid API key, the request is rejected:
curl -k -H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}],"model":"llama3.2:1b"}' \
https://ai.example.com:8081/ollama/v1/chat/completions
# Returns 401 UnauthorizedVerify model access control
# dev-user trying to use Gemini → 403 Forbidden
curl -k -H "apikey: dev-key-12345" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}],"model":"gemini-2.5-flash"}' \
https://ai.example.com:8081/gemini/v1/chat/completionsHow it works
AI Proxy Plugin
External routes — each provider gets its own ai-proxy KongPlugin instance and HTTPRoute:
- Ollama: Uses
provider: openaiwithupstream_urlpointing to the local Ollama service. - Gemini: Injects the Gemini API key as a query parameter and transforms requests from OpenAI format to Gemini format.
- Anthropic: Injects the Anthropic API key as an
x-api-keyheader and transforms requests from OpenAI to Anthropic format.
Internal route (OpenWebUI) — a single ai-proxy-advanced plugin with model_alias handles all providers through one endpoint:
OpenWebUI → Kong /ollama/v1/chat/completions
│
├─ model: "llama3.2:1b" → ai-proxy-advanced → Ollama
├─ model: "gemma3:1b" → ai-proxy-advanced → Ollama
├─ model: "gemini-2.5-flash" → ai-proxy-advanced → Gemini API
└─ model: "claude-haiku-*" → ai-proxy-advanced → Anthropic APIEach target in the ai-proxy-advanced plugin has a model_alias matching the model name that clients send. Kong routes the request to the correct provider based on the model field in the request body — no separate routes needed.
All providers expose an OpenAI-compatible interface via Kong, so any OpenAI client (like OpenWebUI) can use them transparently.
Model Recommendations
This demo uses small models (1b–1.5b parameters) optimized for k3d on laptops. For production or GPU-equipped environments, consider larger models:
| Use Case | Demo (CPU, k3d) | Production (GPU) | Parameters |
|---|---|---|---|
| General Chat | llama3.2:1b | llama3.3:70b, qwen2.5:32b | 1b → 32–70b |
| Code Generation | qwen2.5-coder:1.5b | qwen2.5-coder:32b, codellama:34b | 1.5b → 32–34b |
| Alternative Chat | gemma3:1b | gemma3:27b, mistral:7b | 1b → 7–27b |
| Embeddings (RAG) | nomic-embed-text | nomic-embed-text, mxbai-embed-large | 137M → 335M |
To add a model, pull it in Ollama and create a new ai-proxy plugin + route in Kong:
# Pull a new model
kubectl exec -n ai-platform deployment/ollama -- ollama pull mistral:7b
# Create plugin + route (follow the pattern of existing models)
# Then update the models response listKong acts as a model router — each model gets its own route with independent authentication, rate limiting, and monitoring. This enables per-model access control (e.g., restrict expensive GPU models to specific consumer groups).
Consumer Groups & Access Control
Three consumer groups with different provider access levels are pre-configured:
| Consumer | API Key | Keycloak User | Ollama | Gemini | Anthropic |
|---|---|---|---|---|---|
dev-user | dev-key-12345 | dev / dev | Yes | No (403) | No (403) |
team-lead | lead-key-12345 | lead / lead | Yes | Yes | No (403) |
admin-user | admin-key-12345 | admin / admin | Yes | Yes | Yes |
Access control is enforced at two levels in Kong:
Route-level ACL (acl plugin):
- Ollama route: No ACL — all authenticated consumers can access
- Gemini route: ACL group
gemini-accessrequired (team-lead, admin-user) - Anthropic route: ACL group
anthropic-accessrequired (admin-user only)
Per-user model filtering (post-function + pre-function plugins):
- Model list (
ai-models-filtered): Apost-functionplugin on the/v1/modelsroutes dynamically returns only the models the user is authorized for. It identifies the user via theX-OpenWebUI-User-Emailheader (forwarded by OpenWebUI) or the Kong consumer'scustom_id(for direct API access). - Chat enforcement (
ai-model-acl): Apre-functionplugin on chat routes validates the requested model against the user's role and returns 403 if unauthorized. Embedding requests (/v1/embeddings) are excluded from this check since RAG usesnomic-embed-textinternally.
This ensures users only see and can only use models matching their role — both in the OpenWebUI dropdown and via direct API calls.
The key-auth plugin accepts the key via the apikey header (for curl) or the Authorization: Bearer header (for OpenAI-compatible clients like OpenWebUI).
Consumer API keys are separate from provider API keys (Gemini, Anthropic). Provider keys are centrally managed in Kong — consumers never see them.
OpenWebUI Integration
OpenWebUI connects to Kong's AI proxy as an OpenAI-compatible backend. All traffic — including model discovery and RAG embeddings — is routed through Kong:
- Chat: All chat requests go through Kong (
/ollama/v1/chat/completions), enabling centralized authentication, logging, and metrics. With Enterprise, the internal route usesai-proxy-advanced(multi-model) so a single connection handles all providers. - Model discovery: Kong returns a per-user filtered model list via a
post-functionplugin on/ollama/v1/models. The plugin reads theX-OpenWebUI-User-Emailheader (forwarded whenENABLE_FORWARD_USER_INFO_HEADERS=true) and returns only the models the user is authorized for. The embedding model (nomic-embed-text) is excluded from the list. - RAG embeddings: OpenWebUI uses the OpenAI embedding engine (
RAG_EMBEDDING_ENGINE=openai) routed through Kong to Ollama. This prevents OpenWebUI from discovering Ollama models directly (which would bypass Kong's model filtering). - Access control bypass: OpenWebUI's built-in model access control is disabled (
BYPASS_MODEL_ACCESS_CONTROL=true) since Kong handles authorization. Users have theuserrole (DEFAULT_USER_ROLE=user) so they cannot modify OpenWebUI settings.
RAG (Retrieval-Augmented Generation)
OpenWebUI includes built-in RAG support with an embedded ChromaDB vector store. This showcase configures it to use Ollama's nomic-embed-text model for generating embeddings — completely local, no data leaves the cluster.
How it works:
- Upload documents (PDF, TXT, MD, etc.) via the OpenWebUI sidebar (
+button next to chat) - OpenWebUI extracts text and splits it into chunks
- Each chunk is embedded using
nomic-embed-textvia the local Ollama instance - Embeddings are stored in OpenWebUI's built-in ChromaDB
- When chatting, relevant chunks are retrieved and injected into the prompt as context
Usage:
- Open
https://chat.example.com:8081 - Click the
+icon in the sidebar to create a new knowledge base - Upload one or more documents
- Start a new chat, click
+in the message bar, and select the knowledge base - Ask questions about your documents — the model will answer based on the retrieved context
All processing happens locally: text extraction, embedding generation (Ollama), vector storage (ChromaDB), and LLM inference. No document content is sent to external providers unless you explicitly select Gemini or Anthropic as the chat model.
Monitoring (Prometheus + Grafana)
When monitoring is enabled, the following components are deployed:
Kong Prometheus Plugin — activated globally, exposes request metrics:
| Metric | Description |
|---|---|
kong_http_requests_total | Total requests by service (AI route), status code, consumer |
kong_bandwidth_bytes | Request/response bandwidth by service and direction |
kong_upstream_latency_ms | LLM provider response latency histogram |
kong_kong_latency_ms | Kong processing latency histogram |
AI Metrics Exporter — a lightweight Python service that receives LLM response data from Kong's http-log plugin and exposes token-level metrics:
| Metric | Description |
|---|---|
ai_llm_prompt_tokens_total | Prompt tokens by provider, model, consumer |
ai_llm_completion_tokens_total | Completion tokens by provider, model, consumer |
ai_llm_tokens_total | Total tokens by provider, model, consumer |
ai_llm_requests_total | Total LLM requests by provider, model, consumer |
ai_llm_estimated_cost_usd | Estimated cost in USD by provider, model, consumer |
The exporter resolves the actual user identity from OpenWebUI's X-OpenWebUI-User-Email header (forwarded by ENABLE_FORWARD_USER_INFO_HEADERS). For direct API access, the Kong consumer's custom_id (matching the Keycloak username) is used. This ensures consistent consumer labels (dev, lead, admin) regardless of the access path.
The exporter also parses responses from multiple providers: OpenAI/Ollama format (choices + usage), Gemini format (candidates + usageMetadata), and Anthropic format (content + usage.input_tokens/output_tokens).
Note: Token metrics require non-streaming responses. The
ai-proxy-advancedplugin on the internal route disables streaming by default. For streaming requests via direct API, token counts are not available.
The pre-installed Kong AI Gateway Grafana dashboard includes:
- Overview: Total AI requests, success/error counts, bandwidth
- Request Rate by Provider: Time series of request rates per LLM route
- Bandwidth by Provider: Ingress/egress data volume per provider
- Latency Percentiles: p50/p95/p99 upstream latency per provider
- Consumer & Status Analytics: Pie charts for request distribution
- Token Usage: Prompt, completion, and total token counts by provider and consumer
- Enterprise License: License TTL and error count (when Enterprise license is active)
Open https://grafana.example.com:8081 (admin / admin) and navigate to the Kong > Kong AI Gateway dashboard.
Enterprise: Keycloak OIDC & Consumer Mapping
With an Enterprise license, Keycloak is deployed as an identity provider with two authentication paths:
OpenWebUI Login (browser-based): OpenWebUI uses its built-in OIDC support for user authentication:
- User opens OpenWebUI → Login page with "Keycloak" button
- Browser redirects to Keycloak login page
- User authenticates (
dev/dev,lead/lead, oradmin/admin) - Keycloak redirects back to OpenWebUI with an authorization code
- OpenWebUI exchanges the code for tokens server-side
API Access with OIDC Token (programmatic): Kong's openid-connect plugin validates JWT tokens and maps the preferred_username claim to a Kong consumer via custom_id. This enables API key OR OIDC token authentication on all AI routes:
# Get a token from Keycloak (as 'dev' user)
TOKEN=$(curl -sk -X POST \
https://keycloak.example.com:8081/realms/ai-platform/protocol/openid-connect/token \
-d "client_id=kong-ai-gateway&client_secret=kong-ai-gateway-secret&username=dev&password=dev&grant_type=password" \
| jq -r .access_token)
# Use OIDC token with Kong (maps to dev-user consumer → Ollama only)
curl -k -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}],"model":"llama3.2:1b"}' \
https://ai.example.com:8081/ollama/v1/chat/completionsThe mapping works via Kong's anonymous consumer pattern:
key-authplugin tries API key → if valid, consumer is set- If no API key, fallback to anonymous consumer
openid-connectplugin tries JWT token → if valid, consumer is mapped viapreferred_username→custom_id- If neither succeeds,
request-terminationplugin returns 401
Technical note: OpenWebUI connects to Keycloak's discovery endpoint via the cluster-internal URL (
http://keycloak.ai-platform.svc.cluster.local/realms/ai-platform/.well-known/openid-configuration), while browser redirects use the external URL (https://keycloak.example.com:8081). The Keycloak Service uses port 80 (targetPort 8080) so the internal URL has no port suffix.
Enterprise: AI Gateway Failover
With an Enterprise license, the ai-proxy-advanced plugin provides automatic multi-provider failover via a single unified endpoint:
https://ai.example.com:8081/ai/v1/chat/completionsFailover chain (priority algorithm):
┌─────────────────┐ fail ┌─────────────────┐ fail ┌─────────────────┐
│ Ollama (local) │────────────▶│ Google Gemini │────────────▶│ Anthropic API │
│ weight: 100 │ │ weight: 50 │ │ weight: 10 │
│ $0 (sovereign) │ │ $0 (free tier) │ │ $$ (paid) │
└─────────────────┘ └─────────────────┘ └─────────────────┘Configuration:
- Algorithm:
priority— routes to the highest-weight target first - Circuit breaker: After 2 consecutive failures (
max_fails: 2), a target is marked unavailable for 30 seconds - Failover criteria:
error,timeout,http_429,http_500,http_502,http_503 - Retries: Up to 3 attempts across the failover chain
Test failover:
# No model parameter needed — Kong selects automatically
curl -k -H "apikey: admin-key-12345" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello!"}]}' \
https://ai.example.com:8081/ai/v1/chat/completions
# Simulate Ollama failure: scale down Ollama, repeat the request
# Kong automatically fails over to Gemini (or Anthropic)
kubectl scale deployment ollama -n ai-platform --replicas=0
# ... repeat curl, observe failover ...
kubectl scale deployment ollama -n ai-platform --replicas=1This demonstrates the sovereign AI gateway pattern: local-first processing with automatic cloud fallback when the local model is unavailable.
Note: The unified
/aiendpoint exists alongside the individual model routes (/ollama,/coder,/gemma,/gemini,/anthropic). Direct routes give explicit model control; the failover route provides resilience.
Confidential Computing: Kuma Service Mesh with mTLS
When Kuma is enabled, all inter-service communication within the AI platform is encrypted with mutual TLS (mTLS). Kuma's sidecar proxies handle certificate management and rotation automatically.
What gets encrypted:
┌─────────────────────────────────────────────────────────────────┐
│ Kuma Mesh (mTLS) │
│ │
│ Kong Gateway has a Kuma sidecar for mesh membership │
│ Kuma GUI available at kuma-gui.example.com:8081/gui │
└─────────────────────────────────────────────────────────────────┘Current scope:
- Kong Gateway runs with a Kuma sidecar (mesh member)
- Kuma control plane with CNI mode installed
- MeshTrafficPermission policies configured (allow-all within mesh)
- Kuma GUI exposed via Kong HTTPRoute
Kong + Kuma integration requires three fixes:
KONG_DNS_ORDER=A,CNAME(in Helm values) — Kong prefers A records (ClusterIPs) over SRV records (Pod IPs) when resolving service DNS.konghq.com/service-upstream: "true"on meshed backend Services — forces KIC to pass the Service DNS name to Kong instead of resolving to Pod IPs. After enabling Kuma, the setup re-toggles these annotations to force KIC to re-read them (KIC caches endpoint targets from before Kuma was enabled).MeshProxyPatchresources — override Kuma's default HTTP/2 protocol on outbound clusters to HTTP/1.1 (required for Ollama, Keycloak, and OpenWebUI backends).
Note: Do NOT set
konghq.com/service-upstreamon Kong's own internal services (kong-gateway-manager,kong-gateway-admin). With multi-port services, KIC incorrectly selects the TLS port when this annotation is set. The manager/admin routes use HTTP backend ports (8002/8001) instead of HTTPS (8445/8444) to avoid self-signed certificate issues.
Verify mesh status:
# Check sidecar injection
kubectl get pods -n kong -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{range .spec.containers[*]}{.name}{", "}{end}{"\n"}{end}'
# Check mesh policies
kubectl get meshtrafficpermissions -n kuma-cp
# View Kuma GUI
# https://kuma-gui.example.com:8081/guiData sovereignty by provider:
| Provider | Prompt leaves cluster? | Transport |
|---|---|---|
| Ollama (local) | No — full sovereignty | Internal cluster network |
| Gemini (cloud) | Yes — sent to Google | TLS (external) |
| Anthropic (cloud) | Yes — sent to Anthropic | TLS (external) |
For maximum data sovereignty, use Ollama only and leave cloud providers unconfigured. The AI Gateway Failover (Enterprise) can be configured to only fall back to cloud providers when the local model is unavailable — ensuring that prompts only leave the cluster in degraded scenarios, not during normal operation.
Enterprise: Prompt Guard
The ai-prompt-guard plugin inspects prompts before forwarding and blocks known injection patterns like "ignore previous instructions" or "reveal system prompt".