Operations
Failure Modes
How OGuardAI behaves when components fail, with recovery procedures for each scenario
OGuardAI is fail-secure by default: if protection cannot be enforced, the request fails. Unprotected text is never returned to callers.
Failure Matrix
| Component Failure | OGuardAI Behavior | Fail Mode | Recovery |
|---|---|---|---|
| NER sidecar down | Falls back to builtin regex detection. Person, company, and location entities are not detected. Capabilities endpoint shows ner_active: false. Transform/detect responses include a warning. | Fail-open (reduced coverage) | Restart NER sidecar. Check /v1/capabilities for ner_active status. |
| NER sidecar slow (>5s) | Request times out waiting for NER, then falls back to builtin. Adds up to 5 seconds latency per request. | Fail-open (reduced coverage) | Scale NER sidecar or reduce detector.timeout_ms. |
| Redis unavailable | If Redis is the session backend, session operations fail. Sealed-session mode (default) is unaffected since it uses no external storage. Revocation lookups fail if Redis is the revocation backend. | Fail-closed (session ops rejected) | Switch to sealed sessions (stateless) or restore Redis. File-backed revocation works without Redis. |
| Disk full | File-backed revocation writes fail. Audit log writes fail. Server continues processing requests but revocation state may not persist. | Fail-closed (revocation writes rejected) | Free disk space. Revocation state in memory remains consistent until restart. |
| Config file invalid | Server refuses to start. Validation errors printed to stderr with line numbers. | Fail-closed (startup blocked) | Fix config. Run oguardai config validate before deploying. |
| Policy file missing | Default policy is applied. If no default policy exists, all entities are tokenized (maximum protection). Warning emitted to logs. | Fail-secure (default protection) | Add the missing policy file. Use oguardai config validate to check policy references. |
| Session blob expired | Clean error returned: GUARDAI_SESSION_EXPIRED. No data is leaked. Client must re-transform the original text to get a new session. | Fail-closed (request rejected) | Re-submit the original text through /v1/transform. Increase session.ttl_seconds if expirations are frequent. |
| Session blob tampered | AES-256-GCM authentication tag verification fails. Request rejected with GUARDAI_SESSION_INVALID. No data is leaked. | Fail-closed (request rejected) | Client must re-transform from original text. Investigate source of tampering. |
| Rate limit exceeded | HTTP 429 returned with Retry-After header. No data processed. | Fail-closed (request rejected) | Wait for rate limit window to reset. Increase rate_limit.requests_per_second or add instances. |
| Output guard detects new PII | Depending on config: mask replaces new PII with type labels, block rejects the entire response, warn passes through with warning flag. | Configurable | Review LLM output patterns. Adjust output guard sensitivity or action mode. |
| Session key rotated | All existing sealed session blobs become invalid. New transforms work normally. | Fail-closed (old sessions rejected) | Expected behavior during key rotation. Warn users that in-flight sessions will expire. |
| Prompt injection detected | Request is rejected or the injected content is neutralized, depending on prompt_security.action config. | Fail-closed (default) | Review rejected input. Adjust prompt security sensitivity if false positives occur. |
Degradation Flow
Degradation Matrix
What still works when a component is unavailable:
| Component Down | Transform | Rehydrate | Detect | Revoke | Output Guard |
|---|---|---|---|---|---|
| NER sidecar | Partial (builtin only) | Full | Partial (builtin only) | Full | Full |
| Redis | Full (sealed sessions) | Full (sealed sessions) | Full | Degraded (no shared state) | Full |
| Disk | Full | Full | Full | Degraded (no persistence) | Full |
| Policy files | Full (default policy) | Full | Full | Full | Full (default action) |
"Partial" means the operation succeeds but with reduced entity coverage (15 of 18 types). "Degraded" means the operation succeeds but with reduced durability or consistency.
Design Principles
- Never return unprotected text. If detection cannot run, the request fails rather than passing raw PII through.
- Cryptographic verification on every unseal. Tampered or expired session blobs are always rejected.
- Stateless by default. The sealed session model has no external dependencies. Redis and file backends are optional enhancements.
- Graceful NER fallback. NER unavailability reduces coverage but does not block the pipeline. This is the one intentional fail-open path, clearly documented and surfaced via health checks.
- Revoked values are permanently suppressed. Even if the session blob is valid, revoked entities always return
[DELETED].
Monitoring Recommendations
- Poll
/v1/capabilitiesperiodically. Checkner_activefield to detect NER sidecar availability. - Track
guardai_errors_totalby error code. Spike inSESSION_EXPIREDmay indicate TTL misconfiguration. - Track
guardai_rate_limit_rejections_total. Sustained rejections indicate capacity issues. - Track
guardai_output_guard_blocks_total. High block rate may indicate LLM is generating PII.