Security Guarantees
What OGuardAI guarantees, what it does not guarantee, and what remains the customer's responsibility
This document states explicitly what OGuardAI guarantees, what it does not guarantee, and what remains the customer's responsibility.
Canonical Terminology
These terms have precise meanings throughout OGuardAI. If you encounter them in APIs, configs, logs, or documentation, they mean exactly this:
| Term | Definition |
|---|---|
| Tokenized | A raw PII value has been replaced by a semantic token {{type:id}} in the output text. The original value is stored in the encrypted session blob. Tokenization is deterministic: the same value always produces the same token within a session. |
| Restorable | A tokenized value CAN be restored to its original form using the session state blob. Restorability depends on: (1) the session blob being valid and non-expired, (2) the entity not being revoked, (3) the policy allowing restore for that entity type and output channel. A value that was tokenized is restorable until its session expires or the entity is revoked. |
| Revoked | A value has been permanently marked as non-restorable. During rehydrate, revoked entities return [DELETED] instead of the original value. Revocation is persistent (survives server restart) and uses HMAC-SHA-256 hashing -- no raw PII is stored in the revocation table. Revoking a person cascades to all linked entities (email, phone, address) via belongs_to relationships. |
| Deleted | In OGuardAI's context, deletion means the original value is no longer recoverable by any means. This is achieved by: (1) revoking the entity, (2) allowing the session blob to expire, or (3) dropping the session secret. OGuardAI does NOT delete data in external systems (vector stores, databases, log sinks) -- that is the integrator's responsibility. |
| Blocked | The policy engine has determined that this entity type is not allowed in the given context. Blocked entities are removed from the output entirely (not tokenized, not passed through). The entities_blocked counter tracks this. In the output guard context, "blocked" means the entire response is rejected because new high-sensitivity PII was detected. |
| Masked | A value has been replaced with a type label like [EMAIL] or [SSN]. Masking is irreversible -- the original value cannot be recovered from the mask. Masking occurs in two places: (1) as a restore mode (masked), where the session value is intentionally obscured, and (2) in the output guard, where newly detected PII is replaced with type labels. |
| Redacted | Synonymous with "masked" in OGuardAI. We prefer "masked" in API surfaces and "redacted" only in human-facing documentation. They are functionally identical. |
| Detected | An entity span has been identified by the detection engine (regex, NER, or both). Detection does not imply tokenization -- the policy engine decides what happens to each detected entity (tokenize, block, or pass through). |
| Passed through | The policy engine has determined that this entity type is allowed in the given context. The raw value remains in the output unchanged. This is a deliberate policy decision, not a detection failure. |
| Session state | An AES-256-GCM encrypted blob containing the token map (token-to-original-value mappings), session metadata, and policy reference. The blob is opaque to clients and LLMs. It is the ONLY artifact that can restore tokenized values. |
| Trace ID | A UUID v4 identifier that correlates all operations in a request lifecycle: transform, proxy pass, tool calls, rehydrate, output guard, and revocation. Client-supplied or auto-generated. Use the same trace_id across transform and rehydrate to enable full incident reconstruction. |
What OGuardAI GUARANTEES
Data Protection
- Raw PII values NEVER appear in
safe_textoutput (verified by 1,800+ unit tests) - Tokenized text uses ONLY the canonical
{{type:id}}format - Sealed session blobs are encrypted with AES-256-GCM (authenticated encryption)
- Tampered session blobs are always rejected (cryptographic verification)
- Expired sessions produce clean errors, never data leaks
Detection
- Builtin mode (regex): 15 entity types detected deterministically, same input = same output
- NER mode (GLiNER): 18 entity types including person/company/location, detection quality depends on model
- Both modes: detection is applied to ALL string content in the configured scan scope
Restore
fullrestore is byte-for-byte identical to the original value- Each of the 6 restore modes produces predictable, documented output
- Channel-specific restore rules are applied deterministically per policy
Revocation
- Revoked entity values ALWAYS return
[DELETED]during rehydrate - Cascade revocation: revoking a person suppresses ALL linked entities (email, phone, address)
- Revocation is persistent (file or Redis backend)
- Revocation uses HMAC-SHA-256 -- no raw PII stored in the revocation table
Revocation Contract
The following are the canonical, binding guarantees for OGuardAI's revocation system:
- Revocation affects FUTURE rehydrate calls only -- outputs already delivered to end users or downstream systems cannot be clawed back.
- Sealed session blobs remain decryptable after revocation, but any revoked value resolves to
[DELETED]instead of the original. - Revoking a person entity cascades to every entity linked via
belongs_torelationships (email, phone, address) -- all linked entities also resolve to[DELETED]. - Multiple entities can be revoked in a single API call (bulk revoke).
- Revocation state survives server restart when using the file backend and is shared across instances when using the Redis backend.
- Vector stores, external databases, and application caches must still delete their own copies of data -- OGuardAI cannot reach into external systems.
- Revocation is irreversible -- there is no "un-revoke" operation.
- The revocation table stores only HMAC-SHA-256 hashes of entity values -- no raw PII is ever stored in the revocation table itself.
Session Security
- Cross-tenant session access is always rejected
- Key rotation instantly invalidates all existing sealed session blobs
- Session TTL is enforced -- expired blobs cannot be unsealed
Policy
- Policy rules are evaluated deterministically for every entity
- Policy inheritance resolves child > parent > default
- Policy integrity can be verified via HMAC signatures
What OGuardAI DOES NOT Guarantee
Detection Completeness
- No detection system catches 100% of PII in all contexts
- Person/company/location detection REQUIRES the Python NER sidecar
- In builtin-only mode, person names, company names, and locations are NOT detected
- OCR text extraction is best-effort -- noisy scans may produce detection gaps
- Custom or domain-specific entity types beyond the 15 built-in types (18 with NER) are not detected
External System Deletion
- OGuardAI does NOT control vector stores, external databases, or log sinks
- RAG chunk deletion in vector stores is the application's responsibility
- Log retention and purging is managed by the logging infrastructure
- OGuardAI provides guidance and signals, but cannot enforce external cleanup
Provider Behavior
- LLM output quality depends on the provider (OpenAI, Anthropic, etc.)
- Token damage patterns vary by provider and model version
- Token repair is best-effort with 3-stage pipeline (strict -> repair -> fuzzy)
- Hallucinated tokens are flagged as unresolved, never fabricated
Performance Under Load
- Latency depends on detector mode, payload size, and NER sidecar availability
- NER mode adds 50-500ms per request depending on text length
- If NER sidecar is configured but unavailable, each request adds up to 5s timeout
- Rate limiting is per-instance, not shared across instances
Distributed Consistency
- Sealed sessions are stateless -- work across any number of instances
- Revocation with file backend is per-instance (use Redis for multi-instance)
- Rate limiting is per-instance (use API gateway for global limits)
- Metrics are per-instance (use Prometheus for aggregation)
Customer Responsibilities
| Responsibility | Why |
|---|---|
| Start NER sidecar if person/company/location needed | NER is optional |
| Delete vector store chunks after RAG delete | OGuardAI doesn't own vector stores |
| Rotate session keys on compromise | Key management is ops responsibility |
| Configure appropriate policy for use case | Policy selection affects protection level |
| Monitor health endpoint | Detect degraded mode early |
Set auth.mode to non-dev for production | Dev mode bypasses auth |
Set session.secret to a real secret | Default secret is warned about |
| Configure log retention | OGuardAI emits audit events, retention is infra |
Failure Modes
| Failure | OGuardAI Behavior |
|---|---|
| NER sidecar down | Falls back to builtin regex (person/company/location missed) |
| Invalid session blob | Clean error returned, no data leaked |
| Malformed LLM output | 3-stage token repair attempted, unresolved tokens flagged |
| Policy not found | Default policy applied |
| Output guard catches new PII | Masked or blocked per config |
| Rate limit exceeded | HTTP 429 with Retry-After header |
| File too large | HTTP 400 rejection |
| Revoked entity in rehydrate | Returns [DELETED] |
What Depends on NER Mode
Not all features work identically in builtin-only vs NER mode. This matrix clarifies:
| Capability | Builtin (regex) | NER (GLiNER) | Notes |
|---|---|---|---|
| Email detection | Yes | Yes | Regex in both modes |
| Phone detection | Yes | Yes | Regex in both modes |
| SSN / IBAN / CC | Yes | Yes | Regex in both modes |
| IP / URL / DOB | Yes | Yes | Regex in both modes |
| Order / Ticket | Yes | Yes | Regex in both modes |
| Person name | No | Yes | Requires NER sidecar |
| Company name | No | Yes | Requires NER sidecar |
| Location | No | Yes | Requires NER sidecar |
| Address (structured) | Partial | Yes | Regex detects 7 country formats; NER detects any language |
| Entity linking | Limited | Full | Person-to-entity links need person detection (NER) |
| Cascade revocation | Limited | Full | Cascade from person to linked entities needs NER for person detection |
| Detection latency | Under 5ms | 50-500ms | NER adds model inference time |
| Determinism | 100% | Model-dependent | Same input may produce slightly different NER confidence scores across model versions |
Bottom line: Builtin mode is fast and deterministic but misses person/company/location. NER mode catches more entity types but adds latency and model dependency.
What Depends on Policy
| Feature | Default Policy | Strict PII Policy | Enterprise Policy |
|---|---|---|---|
| Tokenize email | Yes | Yes | Yes |
| Tokenize phone | Yes | Yes | Yes |
| Block SSN | No | Yes | Per-channel |
| Block IBAN | No | Yes | Per-channel |
| Pass through URL | Yes | No | Per-channel |
| Restore mode | Full | Masked | Per-channel |
| Output guard action | Mask | Block | Per-entity-type |
| Cascade revocation | Available | Available | Available |
| Shadow mode | Config-level | Config-level | Config-level |
Bottom line: Policy controls what happens to detected entities. Detection is independent of policy -- it runs first and finds everything. Policy then decides: tokenize, block, or pass through.
Lifecycle of a Protected Value
At each stage, the value can only move forward (detected -> tokenized -> restored). It cannot be "un-tokenized" without the session blob, and it cannot be "un-revoked" once revoked.