OGuardAI
Operations

Failure Modes

How OGuardAI behaves when components fail, with recovery procedures for each scenario

OGuardAI is fail-secure by default: if protection cannot be enforced, the request fails. Unprotected text is never returned to callers.

Failure Matrix

Component FailureOGuardAI BehaviorFail ModeRecovery
NER sidecar downFalls back to builtin regex detection. Person, company, and location entities are not detected. Capabilities endpoint shows ner_active: false. Transform/detect responses include a warning.Fail-open (reduced coverage)Restart NER sidecar. Check /v1/capabilities for ner_active status.
NER sidecar slow (>5s)Request times out waiting for NER, then falls back to builtin. Adds up to 5 seconds latency per request.Fail-open (reduced coverage)Scale NER sidecar or reduce detector.timeout_ms.
Redis unavailableIf Redis is the session backend, session operations fail. Sealed-session mode (default) is unaffected since it uses no external storage. Revocation lookups fail if Redis is the revocation backend.Fail-closed (session ops rejected)Switch to sealed sessions (stateless) or restore Redis. File-backed revocation works without Redis.
Disk fullFile-backed revocation writes fail. Audit log writes fail. Server continues processing requests but revocation state may not persist.Fail-closed (revocation writes rejected)Free disk space. Revocation state in memory remains consistent until restart.
Config file invalidServer refuses to start. Validation errors printed to stderr with line numbers.Fail-closed (startup blocked)Fix config. Run oguardai config validate before deploying.
Policy file missingDefault policy is applied. If no default policy exists, all entities are tokenized (maximum protection). Warning emitted to logs.Fail-secure (default protection)Add the missing policy file. Use oguardai config validate to check policy references.
Session blob expiredClean error returned: GUARDAI_SESSION_EXPIRED. No data is leaked. Client must re-transform the original text to get a new session.Fail-closed (request rejected)Re-submit the original text through /v1/transform. Increase session.ttl_seconds if expirations are frequent.
Session blob tamperedAES-256-GCM authentication tag verification fails. Request rejected with GUARDAI_SESSION_INVALID. No data is leaked.Fail-closed (request rejected)Client must re-transform from original text. Investigate source of tampering.
Rate limit exceededHTTP 429 returned with Retry-After header. No data processed.Fail-closed (request rejected)Wait for rate limit window to reset. Increase rate_limit.requests_per_second or add instances.
Output guard detects new PIIDepending on config: mask replaces new PII with type labels, block rejects the entire response, warn passes through with warning flag.ConfigurableReview LLM output patterns. Adjust output guard sensitivity or action mode.
Session key rotatedAll existing sealed session blobs become invalid. New transforms work normally.Fail-closed (old sessions rejected)Expected behavior during key rotation. Warn users that in-flight sessions will expire.
Prompt injection detectedRequest is rejected or the injected content is neutralized, depending on prompt_security.action config.Fail-closed (default)Review rejected input. Adjust prompt security sensitivity if false positives occur.

Degradation Flow

Degradation Matrix

What still works when a component is unavailable:

Component DownTransformRehydrateDetectRevokeOutput Guard
NER sidecarPartial (builtin only)FullPartial (builtin only)FullFull
RedisFull (sealed sessions)Full (sealed sessions)FullDegraded (no shared state)Full
DiskFullFullFullDegraded (no persistence)Full
Policy filesFull (default policy)FullFullFullFull (default action)

"Partial" means the operation succeeds but with reduced entity coverage (15 of 18 types). "Degraded" means the operation succeeds but with reduced durability or consistency.

Design Principles

  1. Never return unprotected text. If detection cannot run, the request fails rather than passing raw PII through.
  2. Cryptographic verification on every unseal. Tampered or expired session blobs are always rejected.
  3. Stateless by default. The sealed session model has no external dependencies. Redis and file backends are optional enhancements.
  4. Graceful NER fallback. NER unavailability reduces coverage but does not block the pipeline. This is the one intentional fail-open path, clearly documented and surfaced via health checks.
  5. Revoked values are permanently suppressed. Even if the session blob is valid, revoked entities always return [DELETED].

Monitoring Recommendations

  • Poll /v1/capabilities periodically. Check ner_active field to detect NER sidecar availability.
  • Track guardai_errors_total by error code. Spike in SESSION_EXPIRED may indicate TTL misconfiguration.
  • Track guardai_rate_limit_rejections_total. Sustained rejections indicate capacity issues.
  • Track guardai_output_guard_blocks_total. High block rate may indicate LLM is generating PII.