Deployment Guide
Deploy OGuardAI from local development to production Kubernetes clusters
This guide covers every deployment mode from local development to production Kubernetes clusters.
Note: The current runtime uses sealed sessions (client-carried encrypted blobs), so no shared session store is required. Each instance is stateless.
Table of Contents
- Prerequisites
- Quick Start (Docker)
- Development (Local Build)
- Production (Docker Compose)
- Kubernetes (Helm)
- Configuration Reference
- Session Backend Selection
- Key Rotation Procedure
- Monitoring and Alerting
- Troubleshooting
- Rate Limiting in Multi-Instance Deployments
- Metrics in Multi-Instance Deployments
- Detector Mode Selection
Prerequisites
| Requirement | Version | Purpose |
|---|---|---|
| Docker | 24+ | Container runtime |
| Docker Compose | 2.20+ | Multi-service orchestration |
| Helm | 3.12+ | Kubernetes deployment (optional) |
| Kubernetes | 1.27+ | Container orchestration (optional) |
| Rust | 1.88+ | Building from source (optional) |
| Python | 3.10+ | NER detector sidecar (optional) |
| Node.js | 20+ | SDK and MCP server (optional) |
Quick Start (Docker)
Run OGuardAI with a single command using the pre-built Docker image:
docker run -p 8080:3000 \
-e GUARDAI_SESSION_SECRET=$(openssl rand -base64 32) \
ghcr.io/oronts/oronts-guardai/oguardai-server:latestThis starts OGuardAI with:
- Builtin regex detectors (30+ patterns)
- Sealed session backend (client-carried encrypted blobs)
- Dev auth mode (all requests accepted -- not for production)
- Default policy
Verify the server is running:
curl http://localhost:8080/v1/health
# {"status":"healthy","version":"0.1.0","uptime_seconds":1.2}Test a transform:
curl -X POST http://localhost:8080/v1/transform \
-H "Content-Type: application/json" \
-d '{"input": "Contact julia@firma.de for help"}'Expected response:
{
"safe_text": "Contact `{{email:e_001}}` for help",
"session_id": "...",
"session_state": "...",
"entities": [{"token": "`{{email:e_001}}`", "type": "email"}]
}Development (Local Build)
Build from Source
# Clone the repository
git clone https://github.com/oronts/oguardai.git
cd oguardai
# Build all Rust crates
cargo build --workspace
# Run all tests
cargo test --workspace
# Start the server with defaults
cargo run -p oguardai-server -- --config oguardai.yamlMinimal Development Config
Create oguardai.yaml for local development:
server:
host: "127.0.0.1"
port: 3000
auth:
mode: dev
session:
backend: sealed
secret: "dev-only-secret-change-in-prod!!"
ttl_seconds: 3600
detector:
mode: builtin
policy:
default: default
directory: policies
transform:
context_strategy: full
max_context_tokens: 4096Running with Python Detector (Development)
# Terminal 1: Start Python NER detector
cd apps/detector-py
uv sync
uv run uvicorn guardai_detector_service.main:app --host 0.0.0.0 --port 9090
# Terminal 2: Start Rust server with detector URL
GUARDAI_DETECTOR_URL=http://localhost:9090 \
cargo run -p oguardai-server -- --config oguardai.yamlProduction (Docker Compose)
Docker Compose deploys two services: the OGuardAI server and the Python NER detector. Sessions use the sealed backend (client-carried encrypted blobs), so no external session store is required.
Environment Setup
Create a .env file (never commit this to version control):
# Required: 32-byte session encryption secret
GUARDAI_SESSION_SECRET=your-production-secret-32-bytes!!
# Optional: log level
GUARDAI_LOG_LEVEL=guardai_server=info,tower_http=infoStart the Full Stack
docker compose -f deploy/docker/docker-compose.yml up --buildThis starts:
| Service | Port | Description |
|---|---|---|
server | 3000 | OGuardAI Rust server |
detector | 9090 | Python NER detector (GLiNER/spaCy) |
Production Docker Compose Configuration
The production docker-compose.yml at deploy/docker/docker-compose.yml includes:
- Health checks on both services (HTTP health endpoints for server and detector).
- Restart policy:
unless-stoppedfor automatic recovery. - Volume mounts: Policies directory mounted read-only.
- Dependency ordering: Server waits for detector to be healthy before starting.
Minimal Docker Compose (Server Only)
For deployments that do not need advanced NER or shared sessions:
docker compose -f deploy/docker/docker-compose.minimal.yml up --buildThis starts only the OGuardAI server with builtin detectors and sealed sessions.
Kubernetes (Helm)
Install the Helm Chart
# Add the OGuardAI Helm repository (when published)
helm repo add guardai https://charts.OGuard.ai
helm repo update
# Install with default values
helm install oguardai guardai/guardai \
--namespace oguardai \
--create-namespace \
--set session.secret="your-production-secret-32-bytes!!"
# Or install from local chart
helm install oguardai deploy/helm/oguardai \
--namespace oguardai \
--create-namespace \
--set session.secret="your-production-secret-32-bytes!!"Production Helm Values
Create values-production.yaml:
server:
replicaCount: 3
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: "2"
memory: 1Gi
session:
backend: sealed
ttlSeconds: 3600
# Reference an existing Kubernetes secret
existingSecret: oguardai-session-secret
existingSecretKey: session-secret
auth:
mode: api_key
api_keys:
- key: sk-your-api-key-here
identity: my-service
scopes:
- transform
- rehydrate
- detect
policy:
default: default
directory: /app/policies
transform:
contextStrategy: full
maxContextTokens: 4096
# Enable Python NER detector
detector:
enabled: true
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 2Gi
# Enable horizontal pod autoscaler
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
# Pod security
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALLInstall with production values:
helm install oguardai deploy/helm/oguardai \
--namespace oguardai \
--create-namespace \
-f values-production.yamlUsing External Secrets
To use an external secret manager (Vault, AWS Secrets Manager) with the Helm chart:
# Create the Kubernetes secret first
kubectl create secret generic oguardai-session-secret \
--namespace oguardai \
--from-literal=session-secret="your-production-secret-32-bytes!!"
# Reference it in Helm values
helm install oguardai deploy/helm/oguardai \
--set session.existingSecret=oguardai-session-secret \
--set session.existingSecretKey=session-secretHelm Chart Components
The chart deploys the following Kubernetes resources:
| Resource | Purpose |
|---|---|
| Deployment | OGuardAI server pods |
| Service | ClusterIP service for internal access |
| ConfigMap | oguardai.yaml configuration |
| Secret | Session encryption secret |
| ServiceAccount | Pod identity |
| HPA | Horizontal pod autoscaler (optional) |
Configuration Reference
The oguardai.yaml file controls all server behavior. Every field has a sensible default.
Server Section
server:
host: "0.0.0.0" # Bind address (default: 0.0.0.0)
port: 3000 # Bind port (default: 3000)Auth Section
auth:
mode: api_key # dev | api_key | jwt (default: dev)
api_keys: # Only used when mode=api_key
- key: sk-...
identity: my-service
tenant_id: tenant-1 # Optional: multi-tenancy scope
scopes: # Optional: restrict to specific scopes
- transform
- rehydrate
# JWT config (only used when mode=jwt)
jwt:
secret: "your-hmac-secret"
issuer: "https://auth.example.com"Note: The
api_keysentries usekey,identity,tenant_id, andscopesfields.
Session Section
session:
backend: sealed # sealed (default: sealed)
secret: "..." # 32-byte encryption secret (REQUIRED for production)
ttl_seconds: 3600 # Session TTL in seconds (default: 3600)The GUARDAI_SESSION_SECRET environment variable overrides session.secret.
Note: Only the
sealedbackend is currently implemented. Thememoryandredisbackends are planned for a future release.
Detector Section
detector:
mode: builtin # builtin | advanced | both (default: builtin)
advanced_url: "http://detector:9090" # URL of Python NER sidecar
default_language: null # ISO 639-1 code, or null for auto-detect| Mode | Behavior |
|---|---|
builtin | Rust regex patterns only (30+ patterns) |
advanced | Python NER sidecar only |
both | Both builtin and advanced; merge results |
Policy Section
policy:
default: default # Default policy name (default: "default")
directory: policies # Directory containing policy YAML files (default: "policies")Transform Section
transform:
context_strategy: full # full | type_summary | referenced_only | none (default: full)
max_context_tokens: 4096 # Max tokens for entity_context (default: 4096)Output Protection Section
output_protection:
enabled: false # Enable output guard (default: false)
mode: strict # strict | permissive (default: strict)
default_action: mask # mask | block | tokenize | allow (default: mask)
exempt_types: # Entity types exempt from output protection
- greetingPrompt Security Section
prompt_security:
enabled: true # Enable prompt injection scanning (default: true)
action: warn # warn | strip | block (default: warn)File Upload Section
file_upload:
max_size_bytes: 52428800 # 50MB defaultRate Limit Section
rate_limit:
enabled: false # Enable rate limiting (default: false)
requests_per_second: 100 # Global rate limit (default: 100)
burst_size: 200 # Burst allowance (default: 200)Tenants Section
tenants:
acme-corp:
default_policy: gdpr-strict
rate_limit:
requests_per_second: 50
burst_size: 100
startup-inc:
default_policy: default
rate_limit:
requests_per_second: 200
burst_size: 400Environment Variable Overrides
| Environment Variable | Overrides | Example |
|---|---|---|
GUARDAI_SESSION_SECRET | session.secret | 32-byte production secret |
GUARDAI_DETECTOR_URL | detector.advanced_url | http://detector:9090 |
RUST_LOG | Log level filter | guardai_server=info,tower_http=info |
Session Backend Selection
| Backend | Use Case | Status |
|---|---|---|
| Sealed (default) | Stateless deployments, horizontal scaling, edge deployments | Available |
| Memory | Development, testing, single-instance demos | Planned -- not yet implemented |
| Redis | Multi-instance production, shared state, server-side session management | Planned -- not yet implemented |
Sealed Backend
The sealed backend is the only currently available session backend. Session state is encrypted (AES-GCM) and returned to the client as an opaque blob. The client must carry this blob between transform and rehydrate calls.
Advantages:
- No server-side state; infinite horizontal scaling; no external dependencies.
- You want the client to control session lifecycle.
Considerations:
- Blob size grows with entity count.
- Client must carry blob between requests.
- Your entity counts per request should be moderate (< 100 entities).
Planned backends (not yet available):
memoryandredisbackends are planned for a future release to support server-side session management and multi-instance deployments with shared state.
Key Rotation Procedure
OGuardAI supports key rotation for session encryption without downtime.
Step 1: Generate New Secret
# Generate a cryptographically random 32-byte secret
openssl rand -base64 32Step 2: Configure Key Ring
Update oguardai.yaml (or Kubernetes secret) with both the old and new keys:
session:
secret: "new-production-secret-32-bytes!!"
# The old key is retained internally with kid=0
# New sessions use the new key with kid=1Step 3: Deploy
Deploy the updated configuration. The behavior during rotation:
| Session Type | Behavior |
|---|---|
| New sessions created after deployment | Encrypted with new key (kid=1) |
| Existing sessions created before deployment | Still decryptable with old key (kid=0) |
Step 4: Wait for TTL Expiry
After ttl_seconds has elapsed since the last session was created with the old key, all old sessions have expired.
Step 5: Remove Old Key
After the TTL period, the old key can be safely removed. All remaining sessions use the new key.
Key Rotation Timeline
Time ------------------------------------------------------------>
Old Key Active | Both Keys Active | New Key Only
(all sessions) | (old sessions | (old sessions
| still decrypt) | expired)
| |
Deploy | TTL | Remove old key
new key | expires |Monitoring and Alerting
Health Check
The /v1/health endpoint returns the overall system status:
curl http://localhost:3000/v1/healthResponse:
{
"status": "healthy",
"version": "0.1.0",
"uptime_seconds": 3600.5,
"components": {
"detector": {"status": "healthy"},
"session": {"status": "healthy"},
"policy": {"status": "healthy"}
}
}Status values: healthy, degraded (partial functionality), unhealthy (service failure).
What to Monitor
| Metric | Source | Alert Threshold |
|---|---|---|
| HTTP response codes | Access logs / reverse proxy | 5xx rate > 1% |
| Request latency (p95) | Structured logs (latency_ms field) | > 500ms for transform, > 200ms for rehydrate |
| Health check status | /v1/health endpoint | Status != "healthy" |
| Entity detection rate | Structured logs (entity_count field) | Sudden drop may indicate detector failure |
| Session seal/unseal errors | Structured logs (error events) | Any session_expired or session_tampered spike |
| Rate limit rejections | Structured logs | Sustained 429 responses |
| Container restarts | Kubernetes / Docker | > 0 in 5-minute window |
| Memory usage | Container metrics | > 80% of limit |
| CPU usage | Container metrics | Sustained > 70% |
| Detector sidecar | Health check components | Status = "degraded" (fallback to builtin only) |
Recommended Alert Rules
# Example Prometheus alerting rules
groups:
- name: guardai
rules:
- alert: OGuardAIUnhealthy
expr: probe_success{job="guardai-health"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "OGuardAI health check failing"
- alert: OGuardAIHighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="guardai"}[5m])) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "OGuardAI p95 latency above 500ms"
- alert: OGuardAIHighErrorRate
expr: rate(http_requests_total{job="guardai",status=~"5.."}[5m]) / rate(http_requests_total{job="guardai"}[5m]) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "OGuardAI 5xx error rate above 1%"Log Aggregation
OGuardAI emits structured JSON logs compatible with all major SIEM and log aggregation systems:
{
"timestamp": "2026-04-15T10:30:00Z",
"level": "INFO",
"target": "guardai_server::pipeline",
"message": "transform completed",
"request_id": "req-abc-123",
"session_id": "sess-def-456",
"tenant_id": "acme-corp",
"entity_count": 3,
"latency_ms": 12.5,
"policy": "gdpr-strict"
}Forward logs to your SIEM using standard log collectors (Fluent Bit, Fluentd, Vector, Filebeat).
Troubleshooting
Server does not start
Symptom: oguardai-server exits immediately.
Causes and fixes:
| Cause | Fix |
|---|---|
| Port already in use | Change server.port in config or stop conflicting process |
| Invalid YAML config | Run oguardai config validate or check YAML syntax |
| Missing policies directory | Create policies/ directory or update policy.directory |
"session expired" errors on rehydrate
Symptom: Rehydrate returns {"error": {"code": "session_expired"}}.
Causes and fixes:
| Cause | Fix |
|---|---|
| Session TTL exceeded | Increase session.ttl_seconds or call rehydrate sooner |
| Different session secret between transform and rehydrate | Ensure all server instances use the same GUARDAI_SESSION_SECRET |
| Key rotation mid-flight | Ensure both old and new keys are active during rotation |
Detector sidecar not connecting
Symptom: Health check shows "detector": {"status": "degraded"}.
Causes and fixes:
| Cause | Fix |
|---|---|
| Sidecar not started | Check docker compose logs detector |
| Wrong URL | Verify detector.advanced_url matches sidecar address |
| Sidecar still loading models | Wait for sidecar health check to pass (may take 20-30s on first start) |
| Network isolation | Ensure server and detector are on the same Docker/Kubernetes network |
High latency on first request
Symptom: First request takes significantly longer than subsequent requests.
Cause: Regex patterns and detection models are compiled on first use.
Fix: This is expected behavior. Subsequent requests reuse compiled patterns. For consistent latency, send a warmup request after deployment.
Entity not detected
Symptom: Known PII is not being replaced.
Causes and fixes:
| Cause | Fix |
|---|---|
| Entity type not in policy | Check policy YAML includes the entity type |
| Below confidence threshold | Lower threshold in detect request or policy |
| Builtin-only mode | Names and companies require the Python NER sidecar |
| Non-standard format | Builtin patterns cover common formats; unusual formats may need custom patterns |
Session seal/unseal failures
Symptom: "session": {"status": "unhealthy"} in health check or session_tampered errors.
Causes and fixes:
| Cause | Fix |
|---|---|
| Wrong or missing session secret | Verify GUARDAI_SESSION_SECRET is set and matches across all instances |
| Key rotation mid-flight | Ensure both old and new keys are active during rotation period |
| Tampered or corrupted blob | Client must pass the exact session_state blob received from transform |
| Session TTL exceeded | Increase session.ttl_seconds or call rehydrate sooner |
Rate Limiting in Multi-Instance Deployments
OGuardAI's rate limiter operates per-instance. Each server process maintains independent rate limit buckets.
Implications for Kubernetes / multi-instance:
| Deployment | Effective Rate Limit | Example |
|---|---|---|
| Single instance, limit=100/s | 100/s total | As configured |
| 3 instances, limit=100/s each | Up to 300/s total | Clients hitting different instances get 100/s each |
| 3 instances behind load balancer | ~100/s per client (sticky) or ~300/s (round-robin) | Depends on LB strategy |
Recommended configurations:
For strict per-client limits: Use sticky sessions (session affinity) in your load balancer so each client always hits the same instance.
For global limits: Divide the desired global limit by the number of instances:
# 3 instances, want 300 req/s global -> 100/s per instance
rate_limit:
enabled: true
requests_per_second: 100
burst_size: 200For shared rate limiting (advanced): Deploy a Redis-backed rate limiter in front of OGuardAI (e.g., Kong, Envoy, or API gateway rate limiting). OGuardAI's built-in rate limiter then serves as a secondary defense.
Metrics in Multi-Instance Deployments
Each OGuardAI instance exposes its own /metrics endpoint. Prometheus scrapes all instances independently.
Prometheus scrape config:
# prometheus.yml
scrape_configs:
- job_name: 'guardai'
kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: app=oguardai-server
relabel_configs:
- source_labels: [__meta_kubernetes_pod_ip]
target_label: __address__
replacement: '$1:3000'Aggregation in PromQL:
# Total transforms across all instances
sum(guardai_transforms_total)
# Per-instance error rate
rate(guardai_errors_total[5m])
# p95 latency across all instances
histogram_quantile(0.95, sum(rate(guardai_transform_duration_seconds_bucket[5m])) by (le))Key point:
Per-instance metrics is the standard pattern for Prometheus-based observability. No shared metrics backend needed.
Detector Mode Selection
Quick Decision Guide
| Your Use Case | Recommended Mode | Why |
|---|---|---|
| Low-latency API protection | builtin | Sub-millisecond, no dependencies |
| Full PII coverage (names, companies) | both | Builtin + NER gives best coverage |
| Maximum accuracy, latency acceptable | advanced | All detection via trained models |
| NER sidecar sometimes unavailable | both | Graceful fallback to builtin |
How to Check Current Mode
# Health endpoint shows detector status
curl http://localhost:3000/v1/health | jq '.components.detector'
# Capabilities shows what entity types are available
curl http://localhost:3000/v1/capabilities | jq '.entity_types[].name'
# Diagnostics shows full config
curl http://localhost:3000/v1/diagnostics | jq '.detector_mode'NER Sidecar Availability
When detector.mode: both is configured:
| NER Status | Health Shows | Capabilities Shows | Behavior |
|---|---|---|---|
| Running + healthy | builtin_and_ner (full entity detection) | 18 entity types | Full detection |
| Not running | Same (optimistic) | 18 types (optimistic) | 5s timeout per request, then fallback to builtin |
| Not configured | builtin_only (person/company/location unavailable) | 15 entity types | Builtin only, no timeout |
Latency Troubleshooting
If transform latency is high (>1 second p50):
- Check detector mode:
curl /v1/diagnostics | jq .detector_mode - If "Both": Check if NER sidecar is running at the configured URL
- If NER is down: Either start the sidecar or switch to
detector.mode: builtin - Verify:
curl /metrics | grep transform_duration_seconds
This document is maintained alongside the OGuardAI source code. For the latest version, see the repository at docs/deployment-guide.md.