This guide covers every deployment mode from local development to production Kubernetes clusters.

Note: The current runtime uses sealed sessions (client-carried encrypted blobs), so no shared session store is required. Each instance is stateless.

Prerequisites
Quick Start (Docker)
Development (Local Build)
Production (Docker Compose)
Kubernetes (Helm)
Configuration Reference
Session Backend Selection
Key Rotation Procedure
Monitoring and Alerting
Troubleshooting
Rate Limiting in Multi-Instance Deployments
Metrics in Multi-Instance Deployments
Detector Mode Selection

Prerequisites

Requirement	Version	Purpose
Docker	24+	Container runtime
Docker Compose	2.20+	Multi-service orchestration
Helm	3.12+	Kubernetes deployment (optional)
Kubernetes	1.27+	Container orchestration (optional)
Rust	1.88+	Building from source (optional)
Python	3.10+	NER detector sidecar (optional)
Node.js	20+	SDK and MCP server (optional)

Quick Start (Docker)

Run OGuardAI with a single command using the pre-built Docker image:

docker run -p 8080:3000 \
  -e GUARDAI_SESSION_SECRET=$(openssl rand -base64 32) \
  ghcr.io/oronts/oronts-guardai/oguardai-server:latest

This starts OGuardAI with:

Builtin regex detectors (30+ patterns)
Sealed session backend (client-carried encrypted blobs)
Dev auth mode (all requests accepted -- not for production)
Default policy

Verify the server is running:

curl http://localhost:8080/v1/health
# {"status":"healthy","version":"0.1.0","uptime_seconds":1.2}

Test a transform:

curl -X POST http://localhost:8080/v1/transform \
  -H "Content-Type: application/json" \
  -d '{"input": "Contact julia@firma.de for help"}'

Expected response:

{
  "safe_text": "Contact `{{email:e_001}}` for help",
  "session_id": "...",
  "session_state": "...",
  "entities": [{"token": "`{{email:e_001}}`", "type": "email"}]
}

Development (Local Build)

Build from Source

# Clone the repository
git clone https://github.com/oronts/oguardai.git
cd oguardai

# Build all Rust crates
cargo build --workspace

# Run all tests
cargo test --workspace

# Start the server with defaults
cargo run -p oguardai-server -- --config oguardai.yaml

Minimal Development Config

Create oguardai.yaml for local development:

server:
  host: "127.0.0.1"
  port: 3000

auth:
  mode: dev

session:
  backend: sealed
  secret: "dev-only-secret-change-in-prod!!"
  ttl_seconds: 3600

detector:
  mode: builtin

policy:
  default: default
  directory: policies

transform:
  context_strategy: full
  max_context_tokens: 4096

Running with Python Detector (Development)

# Terminal 1: Start Python NER detector
cd apps/detector-py
uv sync
uv run uvicorn guardai_detector_service.main:app --host 0.0.0.0 --port 9090

# Terminal 2: Start Rust server with detector URL
GUARDAI_DETECTOR_URL=http://localhost:9090 \
  cargo run -p oguardai-server -- --config oguardai.yaml

Production (Docker Compose)

Docker Compose deploys two services: the OGuardAI server and the Python NER detector. Sessions use the sealed backend (client-carried encrypted blobs), so no external session store is required.

Environment Setup

Create a .env file (never commit this to version control):

# Required: 32-byte session encryption secret
GUARDAI_SESSION_SECRET=your-production-secret-32-bytes!!

# Optional: log level
GUARDAI_LOG_LEVEL=guardai_server=info,tower_http=info

Start the Full Stack

docker compose -f deploy/docker/docker-compose.yml up --build

This starts:

Service	Port	Description
`server`	3000	OGuardAI Rust server
`detector`	9090	Python NER detector (GLiNER/spaCy)

Production Docker Compose Configuration

The production docker-compose.yml at deploy/docker/docker-compose.yml includes:

Health checks on both services (HTTP health endpoints for server and detector).
Restart policy: unless-stopped for automatic recovery.
Volume mounts: Policies directory mounted read-only.
Dependency ordering: Server waits for detector to be healthy before starting.

Minimal Docker Compose (Server Only)

For deployments that do not need advanced NER or shared sessions:

docker compose -f deploy/docker/docker-compose.minimal.yml up --build

This starts only the OGuardAI server with builtin detectors and sealed sessions.

Kubernetes (Helm)

Install the Helm Chart

# Add the OGuardAI Helm repository (when published)
helm repo add guardai https://charts.OGuard.ai
helm repo update

# Install with default values
helm install oguardai guardai/guardai \
  --namespace oguardai \
  --create-namespace \
  --set session.secret="your-production-secret-32-bytes!!"

# Or install from local chart
helm install oguardai deploy/helm/oguardai \
  --namespace oguardai \
  --create-namespace \
  --set session.secret="your-production-secret-32-bytes!!"

Production Helm Values

Create values-production.yaml:

server:
  replicaCount: 3
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: "2"
      memory: 1Gi

session:
  backend: sealed
  ttlSeconds: 3600
  # Reference an existing Kubernetes secret
  existingSecret: oguardai-session-secret
  existingSecretKey: session-secret

auth:
  mode: api_key
  api_keys:
    - key: sk-your-api-key-here
      identity: my-service
      scopes:
        - transform
        - rehydrate
        - detect

policy:
  default: default
  directory: /app/policies

transform:
  contextStrategy: full
  maxContextTokens: 4096

# Enable Python NER detector
detector:
  enabled: true
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: "2"
      memory: 2Gi

# Enable horizontal pod autoscaler
autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

# Pod security
podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

securityContext:
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL

Install with production values:

helm install oguardai deploy/helm/oguardai \
  --namespace oguardai \
  --create-namespace \
  -f values-production.yaml

Using External Secrets

To use an external secret manager (Vault, AWS Secrets Manager) with the Helm chart:

# Create the Kubernetes secret first
kubectl create secret generic oguardai-session-secret \
  --namespace oguardai \
  --from-literal=session-secret="your-production-secret-32-bytes!!"

# Reference it in Helm values
helm install oguardai deploy/helm/oguardai \
  --set session.existingSecret=oguardai-session-secret \
  --set session.existingSecretKey=session-secret

Helm Chart Components

The chart deploys the following Kubernetes resources:

Resource	Purpose
Deployment	OGuardAI server pods
Service	ClusterIP service for internal access
ConfigMap	`oguardai.yaml` configuration
Secret	Session encryption secret
ServiceAccount	Pod identity
HPA	Horizontal pod autoscaler (optional)

Configuration Reference

The oguardai.yaml file controls all server behavior. Every field has a sensible default.

Server Section

server:
  host: "0.0.0.0"      # Bind address (default: 0.0.0.0)
  port: 3000            # Bind port (default: 3000)

Auth Section

auth:
  mode: api_key         # dev | api_key | jwt (default: dev)
  api_keys:             # Only used when mode=api_key
    - key: sk-...
      identity: my-service
      tenant_id: tenant-1  # Optional: multi-tenancy scope
      scopes:           # Optional: restrict to specific scopes
        - transform
        - rehydrate
  # JWT config (only used when mode=jwt)
  jwt:
    secret: "your-hmac-secret"
    issuer: "https://auth.example.com"

Note: The api_keys entries use key, identity, tenant_id, and scopes fields.

Session Section

session:
  backend: sealed       # sealed (default: sealed)
  secret: "..."         # 32-byte encryption secret (REQUIRED for production)
  ttl_seconds: 3600     # Session TTL in seconds (default: 3600)

The GUARDAI_SESSION_SECRET environment variable overrides session.secret.

Note: Only the sealed backend is currently implemented. The memory and redis backends are planned for a future release.

Detector Section

detector:
  mode: builtin         # builtin | advanced | both (default: builtin)
  advanced_url: "http://detector:9090"  # URL of Python NER sidecar
  default_language: null  # ISO 639-1 code, or null for auto-detect

Mode	Behavior
`builtin`	Rust regex patterns only (30+ patterns)
`advanced`	Python NER sidecar only
`both`	Both builtin and advanced; merge results

Policy Section

policy:
  default: default      # Default policy name (default: "default")
  directory: policies   # Directory containing policy YAML files (default: "policies")

Transform Section

transform:
  context_strategy: full   # full | type_summary | referenced_only | none (default: full)
  max_context_tokens: 4096 # Max tokens for entity_context (default: 4096)

Output Protection Section

output_protection:
  enabled: false        # Enable output guard (default: false)
  mode: strict          # strict | permissive (default: strict)
  default_action: mask  # mask | block | tokenize | allow (default: mask)
  exempt_types:         # Entity types exempt from output protection
    - greeting

Prompt Security Section

prompt_security:
  enabled: true         # Enable prompt injection scanning (default: true)
  action: warn          # warn | strip | block (default: warn)

File Upload Section

file_upload:
  max_size_bytes: 52428800  # 50MB default

Rate Limit Section

rate_limit:
  enabled: false        # Enable rate limiting (default: false)
  requests_per_second: 100  # Global rate limit (default: 100)
  burst_size: 200       # Burst allowance (default: 200)

Tenants Section

tenants:
  acme-corp:
    default_policy: gdpr-strict
    rate_limit:
      requests_per_second: 50
      burst_size: 100
  startup-inc:
    default_policy: default
    rate_limit:
      requests_per_second: 200
      burst_size: 400

Environment Variable Overrides

Environment Variable	Overrides	Example
`GUARDAI_SESSION_SECRET`	`session.secret`	32-byte production secret
`GUARDAI_DETECTOR_URL`	`detector.advanced_url`	`http://detector:9090`
`RUST_LOG`	Log level filter	`guardai_server=info,tower_http=info`

Session Backend Selection

Backend	Use Case	Status
Sealed (default)	Stateless deployments, horizontal scaling, edge deployments	Available
Memory	Development, testing, single-instance demos	Planned -- not yet implemented
Redis	Multi-instance production, shared state, server-side session management	Planned -- not yet implemented

The sealed backend is the only currently available session backend. Session state is encrypted (AES-GCM) and returned to the client as an opaque blob. The client must carry this blob between transform and rehydrate calls.

Advantages:

No server-side state; infinite horizontal scaling; no external dependencies.
You want the client to control session lifecycle.

Considerations:

Blob size grows with entity count.
Client must carry blob between requests.
Your entity counts per request should be moderate (< 100 entities).

Planned backends (not yet available): memory and redis backends are planned for a future release to support server-side session management and multi-instance deployments with shared state.

Key Rotation Procedure

OGuardAI supports key rotation for session encryption without downtime.

Step 1: Generate New Secret

# Generate a cryptographically random 32-byte secret
openssl rand -base64 32

Step 2: Configure Key Ring

Update oguardai.yaml (or Kubernetes secret) with both the old and new keys:

session:
  secret: "new-production-secret-32-bytes!!"
  # The old key is retained internally with kid=0
  # New sessions use the new key with kid=1

Step 3: Deploy

Deploy the updated configuration. The behavior during rotation:

Session Type	Behavior
New sessions created after deployment	Encrypted with new key (kid=1)
Existing sessions created before deployment	Still decryptable with old key (kid=0)

Step 4: Wait for TTL Expiry

After ttl_seconds has elapsed since the last session was created with the old key, all old sessions have expired.

Step 5: Remove Old Key

After the TTL period, the old key can be safely removed. All remaining sessions use the new key.

Key Rotation Timeline

Time ------------------------------------------------------------>

Old Key Active     |  Both Keys Active  |  New Key Only
(all sessions)     |  (old sessions     |  (old sessions
                   |   still decrypt)   |   expired)
                   |                    |
          Deploy   |         TTL        |  Remove old key
          new key  |        expires     |

Monitoring and Alerting

Health Check

The /v1/health endpoint returns the overall system status:

curl http://localhost:3000/v1/health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "uptime_seconds": 3600.5,
  "components": {
    "detector": {"status": "healthy"},
    "session": {"status": "healthy"},
    "policy": {"status": "healthy"}
  }
}

Status values: healthy, degraded (partial functionality), unhealthy (service failure).

What to Monitor

Metric	Source	Alert Threshold
HTTP response codes	Access logs / reverse proxy	5xx rate > 1%
Request latency (p95)	Structured logs (`latency_ms` field)	> 500ms for transform, > 200ms for rehydrate
Health check status	`/v1/health` endpoint	Status != "healthy"
Entity detection rate	Structured logs (`entity_count` field)	Sudden drop may indicate detector failure
Session seal/unseal errors	Structured logs (error events)	Any `session_expired` or `session_tampered` spike
Rate limit rejections	Structured logs	Sustained 429 responses
Container restarts	Kubernetes / Docker	> 0 in 5-minute window
Memory usage	Container metrics	> 80% of limit
CPU usage	Container metrics	Sustained > 70%
Detector sidecar	Health check components	Status = "degraded" (fallback to builtin only)

Recommended Alert Rules

# Example Prometheus alerting rules
groups:
  - name: guardai
    rules:
      - alert: OGuardAIUnhealthy
        expr: probe_success{job="guardai-health"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "OGuardAI health check failing"

      - alert: OGuardAIHighLatency
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="guardai"}[5m])) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "OGuardAI p95 latency above 500ms"

      - alert: OGuardAIHighErrorRate
        expr: rate(http_requests_total{job="guardai",status=~"5.."}[5m]) / rate(http_requests_total{job="guardai"}[5m]) > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "OGuardAI 5xx error rate above 1%"

Log Aggregation

OGuardAI emits structured JSON logs compatible with all major SIEM and log aggregation systems:

{
  "timestamp": "2026-04-15T10:30:00Z",
  "level": "INFO",
  "target": "guardai_server::pipeline",
  "message": "transform completed",
  "request_id": "req-abc-123",
  "session_id": "sess-def-456",
  "tenant_id": "acme-corp",
  "entity_count": 3,
  "latency_ms": 12.5,
  "policy": "gdpr-strict"
}

Forward logs to your SIEM using standard log collectors (Fluent Bit, Fluentd, Vector, Filebeat).

Troubleshooting

Server does not start

Symptom: oguardai-server exits immediately.

Causes and fixes:

Cause	Fix
Port already in use	Change `server.port` in config or stop conflicting process
Invalid YAML config	Run `oguardai config validate` or check YAML syntax
Missing policies directory	Create `policies/` directory or update `policy.directory`

"session expired" errors on rehydrate

Symptom: Rehydrate returns {"error": {"code": "session_expired"}}.

Causes and fixes:

Cause	Fix
Session TTL exceeded	Increase `session.ttl_seconds` or call rehydrate sooner
Different session secret between transform and rehydrate	Ensure all server instances use the same `GUARDAI_SESSION_SECRET`
Key rotation mid-flight	Ensure both old and new keys are active during rotation

Detector sidecar not connecting

Symptom: Health check shows "detector": {"status": "degraded"}.

Causes and fixes:

Cause	Fix
Sidecar not started	Check `docker compose logs detector`
Wrong URL	Verify `detector.advanced_url` matches sidecar address
Sidecar still loading models	Wait for sidecar health check to pass (may take 20-30s on first start)
Network isolation	Ensure server and detector are on the same Docker/Kubernetes network

High latency on first request

Symptom: First request takes significantly longer than subsequent requests.

Cause: Regex patterns and detection models are compiled on first use.

Fix: This is expected behavior. Subsequent requests reuse compiled patterns. For consistent latency, send a warmup request after deployment.

Entity not detected

Symptom: Known PII is not being replaced.

Causes and fixes:

Cause	Fix
Entity type not in policy	Check policy YAML includes the entity type
Below confidence threshold	Lower `threshold` in detect request or policy
Builtin-only mode	Names and companies require the Python NER sidecar
Non-standard format	Builtin patterns cover common formats; unusual formats may need custom patterns

Session seal/unseal failures

Symptom: "session": {"status": "unhealthy"} in health check or session_tampered errors.

Causes and fixes:

Cause	Fix
Wrong or missing session secret	Verify `GUARDAI_SESSION_SECRET` is set and matches across all instances
Key rotation mid-flight	Ensure both old and new keys are active during rotation period
Tampered or corrupted blob	Client must pass the exact `session_state` blob received from transform
Session TTL exceeded	Increase `session.ttl_seconds` or call rehydrate sooner

Rate Limiting in Multi-Instance Deployments

OGuardAI's rate limiter operates per-instance. Each server process maintains independent rate limit buckets.

Implications for Kubernetes / multi-instance:

Deployment	Effective Rate Limit	Example
Single instance, limit=100/s	100/s total	As configured
3 instances, limit=100/s each	Up to 300/s total	Clients hitting different instances get 100/s each
3 instances behind load balancer	~100/s per client (sticky) or ~300/s (round-robin)	Depends on LB strategy

Recommended configurations:

For strict per-client limits: Use sticky sessions (session affinity) in your load balancer so each client always hits the same instance.

For global limits: Divide the desired global limit by the number of instances:

# 3 instances, want 300 req/s global -> 100/s per instance
rate_limit:
  enabled: true
  requests_per_second: 100
  burst_size: 200

For shared rate limiting (advanced): Deploy a Redis-backed rate limiter in front of OGuardAI (e.g., Kong, Envoy, or API gateway rate limiting). OGuardAI's built-in rate limiter then serves as a secondary defense.

Metrics in Multi-Instance Deployments

Each OGuardAI instance exposes its own /metrics endpoint. Prometheus scrapes all instances independently.

Prometheus scrape config:

# prometheus.yml
scrape_configs:
  - job_name: 'guardai'
    kubernetes_sd_configs:
      - role: pod
        selectors:
          - role: pod
            label: app=oguardai-server
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_ip]
        target_label: __address__
        replacement: '$1:3000'

Aggregation in PromQL:

# Total transforms across all instances
sum(guardai_transforms_total)

# Per-instance error rate
rate(guardai_errors_total[5m])

# p95 latency across all instances
histogram_quantile(0.95, sum(rate(guardai_transform_duration_seconds_bucket[5m])) by (le))

Your Use Case	Recommended Mode	Why
Low-latency API protection	`builtin`	Sub-millisecond, no dependencies
Full PII coverage (names, companies)	`both`	Builtin + NER gives best coverage
Maximum accuracy, latency acceptable	`advanced`	All detection via trained models
NER sidecar sometimes unavailable	`both`	Graceful fallback to builtin

How to Check Current Mode

# Health endpoint shows detector status
curl http://localhost:3000/v1/health | jq '.components.detector'

# Capabilities shows what entity types are available
curl http://localhost:3000/v1/capabilities | jq '.entity_types[].name'

# Diagnostics shows full config
curl http://localhost:3000/v1/diagnostics | jq '.detector_mode'

NER Sidecar Availability

When detector.mode: both is configured:

NER Status	Health Shows	Capabilities Shows	Behavior
Running + healthy	`builtin_and_ner (full entity detection)`	18 entity types	Full detection
Not running	Same (optimistic)	18 types (optimistic)	5s timeout per request, then fallback to builtin
Not configured	`builtin_only (person/company/location unavailable)`	15 entity types	Builtin only, no timeout

Latency Troubleshooting

If transform latency is high (>1 second p50):

Check detector mode: curl /v1/diagnostics | jq .detector_mode
If "Both": Check if NER sidecar is running at the configured URL
If NER is down: Either start the sidecar or switch to detector.mode: builtin
Verify: curl /metrics | grep transform_duration_seconds

This document is maintained alongside the OGuardAI source code. For the latest version, see the repository at docs/deployment-guide.md.

Deployment Guide

On this page