OGuardAI
Operations

Deployment Guide

Deploy OGuardAI from local development to production Kubernetes clusters

This guide covers every deployment mode from local development to production Kubernetes clusters.

Note: The current runtime uses sealed sessions (client-carried encrypted blobs), so no shared session store is required. Each instance is stateless.


Table of Contents

  1. Prerequisites
  2. Quick Start (Docker)
  3. Development (Local Build)
  4. Production (Docker Compose)
  5. Kubernetes (Helm)
  6. Configuration Reference
  7. Session Backend Selection
  8. Key Rotation Procedure
  9. Monitoring and Alerting
  10. Troubleshooting
  11. Rate Limiting in Multi-Instance Deployments
  12. Metrics in Multi-Instance Deployments
  13. Detector Mode Selection

Prerequisites

RequirementVersionPurpose
Docker24+Container runtime
Docker Compose2.20+Multi-service orchestration
Helm3.12+Kubernetes deployment (optional)
Kubernetes1.27+Container orchestration (optional)
Rust1.88+Building from source (optional)
Python3.10+NER detector sidecar (optional)
Node.js20+SDK and MCP server (optional)

Quick Start (Docker)

Run OGuardAI with a single command using the pre-built Docker image:

docker run -p 8080:3000 \
  -e GUARDAI_SESSION_SECRET=$(openssl rand -base64 32) \
  ghcr.io/oronts/oronts-guardai/oguardai-server:latest

This starts OGuardAI with:

  • Builtin regex detectors (30+ patterns)
  • Sealed session backend (client-carried encrypted blobs)
  • Dev auth mode (all requests accepted -- not for production)
  • Default policy

Verify the server is running:

curl http://localhost:8080/v1/health
# {"status":"healthy","version":"0.1.0","uptime_seconds":1.2}

Test a transform:

curl -X POST http://localhost:8080/v1/transform \
  -H "Content-Type: application/json" \
  -d '{"input": "Contact julia@firma.de for help"}'

Expected response:

{
  "safe_text": "Contact `{{email:e_001}}` for help",
  "session_id": "...",
  "session_state": "...",
  "entities": [{"token": "`{{email:e_001}}`", "type": "email"}]
}

Development (Local Build)

Build from Source

# Clone the repository
git clone https://github.com/oronts/oguardai.git
cd oguardai

# Build all Rust crates
cargo build --workspace

# Run all tests
cargo test --workspace

# Start the server with defaults
cargo run -p oguardai-server -- --config oguardai.yaml

Minimal Development Config

Create oguardai.yaml for local development:

server:
  host: "127.0.0.1"
  port: 3000

auth:
  mode: dev

session:
  backend: sealed
  secret: "dev-only-secret-change-in-prod!!"
  ttl_seconds: 3600

detector:
  mode: builtin

policy:
  default: default
  directory: policies

transform:
  context_strategy: full
  max_context_tokens: 4096

Running with Python Detector (Development)

# Terminal 1: Start Python NER detector
cd apps/detector-py
uv sync
uv run uvicorn guardai_detector_service.main:app --host 0.0.0.0 --port 9090

# Terminal 2: Start Rust server with detector URL
GUARDAI_DETECTOR_URL=http://localhost:9090 \
  cargo run -p oguardai-server -- --config oguardai.yaml

Production (Docker Compose)

Docker Compose deploys two services: the OGuardAI server and the Python NER detector. Sessions use the sealed backend (client-carried encrypted blobs), so no external session store is required.

Environment Setup

Create a .env file (never commit this to version control):

# Required: 32-byte session encryption secret
GUARDAI_SESSION_SECRET=your-production-secret-32-bytes!!

# Optional: log level
GUARDAI_LOG_LEVEL=guardai_server=info,tower_http=info

Start the Full Stack

docker compose -f deploy/docker/docker-compose.yml up --build

This starts:

ServicePortDescription
server3000OGuardAI Rust server
detector9090Python NER detector (GLiNER/spaCy)

Production Docker Compose Configuration

The production docker-compose.yml at deploy/docker/docker-compose.yml includes:

  • Health checks on both services (HTTP health endpoints for server and detector).
  • Restart policy: unless-stopped for automatic recovery.
  • Volume mounts: Policies directory mounted read-only.
  • Dependency ordering: Server waits for detector to be healthy before starting.

Minimal Docker Compose (Server Only)

For deployments that do not need advanced NER or shared sessions:

docker compose -f deploy/docker/docker-compose.minimal.yml up --build

This starts only the OGuardAI server with builtin detectors and sealed sessions.


Kubernetes (Helm)

Install the Helm Chart

# Add the OGuardAI Helm repository (when published)
helm repo add guardai https://charts.OGuard.ai
helm repo update

# Install with default values
helm install oguardai guardai/guardai \
  --namespace oguardai \
  --create-namespace \
  --set session.secret="your-production-secret-32-bytes!!"

# Or install from local chart
helm install oguardai deploy/helm/oguardai \
  --namespace oguardai \
  --create-namespace \
  --set session.secret="your-production-secret-32-bytes!!"

Production Helm Values

Create values-production.yaml:

server:
  replicaCount: 3
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: "2"
      memory: 1Gi

session:
  backend: sealed
  ttlSeconds: 3600
  # Reference an existing Kubernetes secret
  existingSecret: oguardai-session-secret
  existingSecretKey: session-secret

auth:
  mode: api_key
  api_keys:
    - key: sk-your-api-key-here
      identity: my-service
      scopes:
        - transform
        - rehydrate
        - detect

policy:
  default: default
  directory: /app/policies

transform:
  contextStrategy: full
  maxContextTokens: 4096

# Enable Python NER detector
detector:
  enabled: true
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: "2"
      memory: 2Gi

# Enable horizontal pod autoscaler
autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

# Pod security
podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

securityContext:
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL

Install with production values:

helm install oguardai deploy/helm/oguardai \
  --namespace oguardai \
  --create-namespace \
  -f values-production.yaml

Using External Secrets

To use an external secret manager (Vault, AWS Secrets Manager) with the Helm chart:

# Create the Kubernetes secret first
kubectl create secret generic oguardai-session-secret \
  --namespace oguardai \
  --from-literal=session-secret="your-production-secret-32-bytes!!"

# Reference it in Helm values
helm install oguardai deploy/helm/oguardai \
  --set session.existingSecret=oguardai-session-secret \
  --set session.existingSecretKey=session-secret

Helm Chart Components

The chart deploys the following Kubernetes resources:

ResourcePurpose
DeploymentOGuardAI server pods
ServiceClusterIP service for internal access
ConfigMapoguardai.yaml configuration
SecretSession encryption secret
ServiceAccountPod identity
HPAHorizontal pod autoscaler (optional)

Configuration Reference

The oguardai.yaml file controls all server behavior. Every field has a sensible default.

Server Section

server:
  host: "0.0.0.0"      # Bind address (default: 0.0.0.0)
  port: 3000            # Bind port (default: 3000)

Auth Section

auth:
  mode: api_key         # dev | api_key | jwt (default: dev)
  api_keys:             # Only used when mode=api_key
    - key: sk-...
      identity: my-service
      tenant_id: tenant-1  # Optional: multi-tenancy scope
      scopes:           # Optional: restrict to specific scopes
        - transform
        - rehydrate
  # JWT config (only used when mode=jwt)
  jwt:
    secret: "your-hmac-secret"
    issuer: "https://auth.example.com"

Note: The api_keys entries use key, identity, tenant_id, and scopes fields.

Session Section

session:
  backend: sealed       # sealed (default: sealed)
  secret: "..."         # 32-byte encryption secret (REQUIRED for production)
  ttl_seconds: 3600     # Session TTL in seconds (default: 3600)

The GUARDAI_SESSION_SECRET environment variable overrides session.secret.

Note: Only the sealed backend is currently implemented. The memory and redis backends are planned for a future release.

Detector Section

detector:
  mode: builtin         # builtin | advanced | both (default: builtin)
  advanced_url: "http://detector:9090"  # URL of Python NER sidecar
  default_language: null  # ISO 639-1 code, or null for auto-detect
ModeBehavior
builtinRust regex patterns only (30+ patterns)
advancedPython NER sidecar only
bothBoth builtin and advanced; merge results

Policy Section

policy:
  default: default      # Default policy name (default: "default")
  directory: policies   # Directory containing policy YAML files (default: "policies")

Transform Section

transform:
  context_strategy: full   # full | type_summary | referenced_only | none (default: full)
  max_context_tokens: 4096 # Max tokens for entity_context (default: 4096)

Output Protection Section

output_protection:
  enabled: false        # Enable output guard (default: false)
  mode: strict          # strict | permissive (default: strict)
  default_action: mask  # mask | block | tokenize | allow (default: mask)
  exempt_types:         # Entity types exempt from output protection
    - greeting

Prompt Security Section

prompt_security:
  enabled: true         # Enable prompt injection scanning (default: true)
  action: warn          # warn | strip | block (default: warn)

File Upload Section

file_upload:
  max_size_bytes: 52428800  # 50MB default

Rate Limit Section

rate_limit:
  enabled: false        # Enable rate limiting (default: false)
  requests_per_second: 100  # Global rate limit (default: 100)
  burst_size: 200       # Burst allowance (default: 200)

Tenants Section

tenants:
  acme-corp:
    default_policy: gdpr-strict
    rate_limit:
      requests_per_second: 50
      burst_size: 100
  startup-inc:
    default_policy: default
    rate_limit:
      requests_per_second: 200
      burst_size: 400

Environment Variable Overrides

Environment VariableOverridesExample
GUARDAI_SESSION_SECRETsession.secret32-byte production secret
GUARDAI_DETECTOR_URLdetector.advanced_urlhttp://detector:9090
RUST_LOGLog level filterguardai_server=info,tower_http=info

Session Backend Selection

BackendUse CaseStatus
Sealed (default)Stateless deployments, horizontal scaling, edge deploymentsAvailable
MemoryDevelopment, testing, single-instance demosPlanned -- not yet implemented
RedisMulti-instance production, shared state, server-side session managementPlanned -- not yet implemented

Sealed Backend

The sealed backend is the only currently available session backend. Session state is encrypted (AES-GCM) and returned to the client as an opaque blob. The client must carry this blob between transform and rehydrate calls.

Advantages:

  • No server-side state; infinite horizontal scaling; no external dependencies.
  • You want the client to control session lifecycle.

Considerations:

  • Blob size grows with entity count.
  • Client must carry blob between requests.
  • Your entity counts per request should be moderate (< 100 entities).

Planned backends (not yet available): memory and redis backends are planned for a future release to support server-side session management and multi-instance deployments with shared state.


Key Rotation Procedure

OGuardAI supports key rotation for session encryption without downtime.

Step 1: Generate New Secret

# Generate a cryptographically random 32-byte secret
openssl rand -base64 32

Step 2: Configure Key Ring

Update oguardai.yaml (or Kubernetes secret) with both the old and new keys:

session:
  secret: "new-production-secret-32-bytes!!"
  # The old key is retained internally with kid=0
  # New sessions use the new key with kid=1

Step 3: Deploy

Deploy the updated configuration. The behavior during rotation:

Session TypeBehavior
New sessions created after deploymentEncrypted with new key (kid=1)
Existing sessions created before deploymentStill decryptable with old key (kid=0)

Step 4: Wait for TTL Expiry

After ttl_seconds has elapsed since the last session was created with the old key, all old sessions have expired.

Step 5: Remove Old Key

After the TTL period, the old key can be safely removed. All remaining sessions use the new key.

Key Rotation Timeline

Time ------------------------------------------------------------>

Old Key Active     |  Both Keys Active  |  New Key Only
(all sessions)     |  (old sessions     |  (old sessions
                   |   still decrypt)   |   expired)
                   |                    |
          Deploy   |         TTL        |  Remove old key
          new key  |        expires     |

Monitoring and Alerting

Health Check

The /v1/health endpoint returns the overall system status:

curl http://localhost:3000/v1/health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "uptime_seconds": 3600.5,
  "components": {
    "detector": {"status": "healthy"},
    "session": {"status": "healthy"},
    "policy": {"status": "healthy"}
  }
}

Status values: healthy, degraded (partial functionality), unhealthy (service failure).

What to Monitor

MetricSourceAlert Threshold
HTTP response codesAccess logs / reverse proxy5xx rate > 1%
Request latency (p95)Structured logs (latency_ms field)> 500ms for transform, > 200ms for rehydrate
Health check status/v1/health endpointStatus != "healthy"
Entity detection rateStructured logs (entity_count field)Sudden drop may indicate detector failure
Session seal/unseal errorsStructured logs (error events)Any session_expired or session_tampered spike
Rate limit rejectionsStructured logsSustained 429 responses
Container restartsKubernetes / Docker> 0 in 5-minute window
Memory usageContainer metrics> 80% of limit
CPU usageContainer metricsSustained > 70%
Detector sidecarHealth check componentsStatus = "degraded" (fallback to builtin only)
# Example Prometheus alerting rules
groups:
  - name: guardai
    rules:
      - alert: OGuardAIUnhealthy
        expr: probe_success{job="guardai-health"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "OGuardAI health check failing"

      - alert: OGuardAIHighLatency
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="guardai"}[5m])) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "OGuardAI p95 latency above 500ms"

      - alert: OGuardAIHighErrorRate
        expr: rate(http_requests_total{job="guardai",status=~"5.."}[5m]) / rate(http_requests_total{job="guardai"}[5m]) > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "OGuardAI 5xx error rate above 1%"

Log Aggregation

OGuardAI emits structured JSON logs compatible with all major SIEM and log aggregation systems:

{
  "timestamp": "2026-04-15T10:30:00Z",
  "level": "INFO",
  "target": "guardai_server::pipeline",
  "message": "transform completed",
  "request_id": "req-abc-123",
  "session_id": "sess-def-456",
  "tenant_id": "acme-corp",
  "entity_count": 3,
  "latency_ms": 12.5,
  "policy": "gdpr-strict"
}

Forward logs to your SIEM using standard log collectors (Fluent Bit, Fluentd, Vector, Filebeat).


Troubleshooting

Server does not start

Symptom: oguardai-server exits immediately.

Causes and fixes:

CauseFix
Port already in useChange server.port in config or stop conflicting process
Invalid YAML configRun oguardai config validate or check YAML syntax
Missing policies directoryCreate policies/ directory or update policy.directory

"session expired" errors on rehydrate

Symptom: Rehydrate returns {"error": {"code": "session_expired"}}.

Causes and fixes:

CauseFix
Session TTL exceededIncrease session.ttl_seconds or call rehydrate sooner
Different session secret between transform and rehydrateEnsure all server instances use the same GUARDAI_SESSION_SECRET
Key rotation mid-flightEnsure both old and new keys are active during rotation

Detector sidecar not connecting

Symptom: Health check shows "detector": {"status": "degraded"}.

Causes and fixes:

CauseFix
Sidecar not startedCheck docker compose logs detector
Wrong URLVerify detector.advanced_url matches sidecar address
Sidecar still loading modelsWait for sidecar health check to pass (may take 20-30s on first start)
Network isolationEnsure server and detector are on the same Docker/Kubernetes network

High latency on first request

Symptom: First request takes significantly longer than subsequent requests.

Cause: Regex patterns and detection models are compiled on first use.

Fix: This is expected behavior. Subsequent requests reuse compiled patterns. For consistent latency, send a warmup request after deployment.

Entity not detected

Symptom: Known PII is not being replaced.

Causes and fixes:

CauseFix
Entity type not in policyCheck policy YAML includes the entity type
Below confidence thresholdLower threshold in detect request or policy
Builtin-only modeNames and companies require the Python NER sidecar
Non-standard formatBuiltin patterns cover common formats; unusual formats may need custom patterns

Session seal/unseal failures

Symptom: "session": {"status": "unhealthy"} in health check or session_tampered errors.

Causes and fixes:

CauseFix
Wrong or missing session secretVerify GUARDAI_SESSION_SECRET is set and matches across all instances
Key rotation mid-flightEnsure both old and new keys are active during rotation period
Tampered or corrupted blobClient must pass the exact session_state blob received from transform
Session TTL exceededIncrease session.ttl_seconds or call rehydrate sooner

Rate Limiting in Multi-Instance Deployments

OGuardAI's rate limiter operates per-instance. Each server process maintains independent rate limit buckets.

Implications for Kubernetes / multi-instance:

DeploymentEffective Rate LimitExample
Single instance, limit=100/s100/s totalAs configured
3 instances, limit=100/s eachUp to 300/s totalClients hitting different instances get 100/s each
3 instances behind load balancer~100/s per client (sticky) or ~300/s (round-robin)Depends on LB strategy

For strict per-client limits: Use sticky sessions (session affinity) in your load balancer so each client always hits the same instance.

For global limits: Divide the desired global limit by the number of instances:

# 3 instances, want 300 req/s global -> 100/s per instance
rate_limit:
  enabled: true
  requests_per_second: 100
  burst_size: 200

For shared rate limiting (advanced): Deploy a Redis-backed rate limiter in front of OGuardAI (e.g., Kong, Envoy, or API gateway rate limiting). OGuardAI's built-in rate limiter then serves as a secondary defense.


Metrics in Multi-Instance Deployments

Each OGuardAI instance exposes its own /metrics endpoint. Prometheus scrapes all instances independently.

Prometheus scrape config:

# prometheus.yml
scrape_configs:
  - job_name: 'guardai'
    kubernetes_sd_configs:
      - role: pod
        selectors:
          - role: pod
            label: app=oguardai-server
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_ip]
        target_label: __address__
        replacement: '$1:3000'

Aggregation in PromQL:

# Total transforms across all instances
sum(guardai_transforms_total)

# Per-instance error rate
rate(guardai_errors_total[5m])

# p95 latency across all instances
histogram_quantile(0.95, sum(rate(guardai_transform_duration_seconds_bucket[5m])) by (le))

Key point:

Per-instance metrics is the standard pattern for Prometheus-based observability. No shared metrics backend needed.


Detector Mode Selection

Quick Decision Guide

Your Use CaseRecommended ModeWhy
Low-latency API protectionbuiltinSub-millisecond, no dependencies
Full PII coverage (names, companies)bothBuiltin + NER gives best coverage
Maximum accuracy, latency acceptableadvancedAll detection via trained models
NER sidecar sometimes unavailablebothGraceful fallback to builtin

How to Check Current Mode

# Health endpoint shows detector status
curl http://localhost:3000/v1/health | jq '.components.detector'

# Capabilities shows what entity types are available
curl http://localhost:3000/v1/capabilities | jq '.entity_types[].name'

# Diagnostics shows full config
curl http://localhost:3000/v1/diagnostics | jq '.detector_mode'

NER Sidecar Availability

When detector.mode: both is configured:

NER StatusHealth ShowsCapabilities ShowsBehavior
Running + healthybuiltin_and_ner (full entity detection)18 entity typesFull detection
Not runningSame (optimistic)18 types (optimistic)5s timeout per request, then fallback to builtin
Not configuredbuiltin_only (person/company/location unavailable)15 entity typesBuiltin only, no timeout

Latency Troubleshooting

If transform latency is high (>1 second p50):

  1. Check detector mode: curl /v1/diagnostics | jq .detector_mode
  2. If "Both": Check if NER sidecar is running at the configured URL
  3. If NER is down: Either start the sidecar or switch to detector.mode: builtin
  4. Verify: curl /metrics | grep transform_duration_seconds

This document is maintained alongside the OGuardAI source code. For the latest version, see the repository at docs/deployment-guide.md.