What OGuardAI guarantees, what it does not guarantee, and what remains the customer's responsibility

This document states explicitly what OGuardAI guarantees, what it does not guarantee, and what remains the customer's responsibility.

Canonical Terminology

These terms have precise meanings throughout OGuardAI. If you encounter them in APIs, configs, logs, or documentation, they mean exactly this:

Term	Definition
Tokenized	A raw PII value has been replaced by a semantic token `{{type:id}}` in the output text. The original value is stored in the encrypted session blob. Tokenization is deterministic: the same value always produces the same token within a session.
Restorable	A tokenized value CAN be restored to its original form using the session state blob. Restorability depends on: (1) the session blob being valid and non-expired, (2) the entity not being revoked, (3) the policy allowing restore for that entity type and output channel. A value that was tokenized is restorable until its session expires or the entity is revoked.
Revoked	A value has been permanently marked as non-restorable. During rehydrate, revoked entities return `[DELETED]` instead of the original value. Revocation is persistent (survives server restart) and uses HMAC-SHA-256 hashing -- no raw PII is stored in the revocation table. Revoking a person cascades to all linked entities (email, phone, address) via `belongs_to` relationships.
Deleted	In OGuardAI's context, deletion means the original value is no longer recoverable by any means. This is achieved by: (1) revoking the entity, (2) allowing the session blob to expire, or (3) dropping the session secret. OGuardAI does NOT delete data in external systems (vector stores, databases, log sinks) -- that is the integrator's responsibility.
Blocked	The policy engine has determined that this entity type is not allowed in the given context. Blocked entities are removed from the output entirely (not tokenized, not passed through). The `entities_blocked` counter tracks this. In the output guard context, "blocked" means the entire response is rejected because new high-sensitivity PII was detected.
Masked	A value has been replaced with a type label like `[EMAIL]` or `[SSN]`. Masking is irreversible -- the original value cannot be recovered from the mask. Masking occurs in two places: (1) as a restore mode (`masked`), where the session value is intentionally obscured, and (2) in the output guard, where newly detected PII is replaced with type labels.
Redacted	Synonymous with "masked" in OGuardAI. We prefer "masked" in API surfaces and "redacted" only in human-facing documentation. They are functionally identical.
Detected	An entity span has been identified by the detection engine (regex, NER, or both). Detection does not imply tokenization -- the policy engine decides what happens to each detected entity (tokenize, block, or pass through).
Passed through	The policy engine has determined that this entity type is allowed in the given context. The raw value remains in the output unchanged. This is a deliberate policy decision, not a detection failure.
Session state	An AES-256-GCM encrypted blob containing the token map (token-to-original-value mappings), session metadata, and policy reference. The blob is opaque to clients and LLMs. It is the ONLY artifact that can restore tokenized values.
Trace ID	A UUID v4 identifier that correlates all operations in a request lifecycle: transform, proxy pass, tool calls, rehydrate, output guard, and revocation. Client-supplied or auto-generated. Use the same trace_id across transform and rehydrate to enable full incident reconstruction.

What OGuardAI GUARANTEES

Data Protection

Raw PII values NEVER appear in safe_text output (verified by 1,800+ unit tests)
Tokenized text uses ONLY the canonical {{type:id}} format
Sealed session blobs are encrypted with AES-256-GCM (authenticated encryption)
Tampered session blobs are always rejected (cryptographic verification)
Expired sessions produce clean errors, never data leaks

Detection

Builtin mode (regex): 15 entity types detected deterministically, same input = same output
NER mode (GLiNER): 18 entity types including person/company/location, detection quality depends on model
Both modes: detection is applied to ALL string content in the configured scan scope

Restore

full restore is byte-for-byte identical to the original value
Each of the 6 restore modes produces predictable, documented output
Channel-specific restore rules are applied deterministically per policy

Revocation

Revoked entity values ALWAYS return [DELETED] during rehydrate
Cascade revocation: revoking a person suppresses ALL linked entities (email, phone, address)
Revocation is persistent (file or Redis backend)
Revocation uses HMAC-SHA-256 -- no raw PII stored in the revocation table

Revocation Contract

The following are the canonical, binding guarantees for OGuardAI's revocation system:

Revocation affects FUTURE rehydrate calls only -- outputs already delivered to end users or downstream systems cannot be clawed back.
Sealed session blobs remain decryptable after revocation, but any revoked value resolves to [DELETED] instead of the original.
Revoking a person entity cascades to every entity linked via belongs_to relationships (email, phone, address) -- all linked entities also resolve to [DELETED].
Multiple entities can be revoked in a single API call (bulk revoke).
Revocation state survives server restart when using the file backend and is shared across instances when using the Redis backend.
Vector stores, external databases, and application caches must still delete their own copies of data -- OGuardAI cannot reach into external systems.
Revocation is irreversible -- there is no "un-revoke" operation.
The revocation table stores only HMAC-SHA-256 hashes of entity values -- no raw PII is ever stored in the revocation table itself.

Session Security

Cross-tenant session access is always rejected
Key rotation instantly invalidates all existing sealed session blobs
Session TTL is enforced -- expired blobs cannot be unsealed

Policy

Policy rules are evaluated deterministically for every entity
Policy inheritance resolves child > parent > default
Policy integrity can be verified via HMAC signatures

What OGuardAI DOES NOT Guarantee

Detection Completeness

No detection system catches 100% of PII in all contexts
Person/company/location detection REQUIRES the Python NER sidecar
In builtin-only mode, person names, company names, and locations are NOT detected
OCR text extraction is best-effort -- noisy scans may produce detection gaps
Custom or domain-specific entity types beyond the 15 built-in types (18 with NER) are not detected

External System Deletion

OGuardAI does NOT control vector stores, external databases, or log sinks
RAG chunk deletion in vector stores is the application's responsibility
Log retention and purging is managed by the logging infrastructure
OGuardAI provides guidance and signals, but cannot enforce external cleanup

Provider Behavior

LLM output quality depends on the provider (OpenAI, Anthropic, etc.)
Token damage patterns vary by provider and model version
Token repair is best-effort with 3-stage pipeline (strict -> repair -> fuzzy)
Hallucinated tokens are flagged as unresolved, never fabricated

Performance Under Load

Latency depends on detector mode, payload size, and NER sidecar availability
NER mode adds 50-500ms per request depending on text length
If NER sidecar is configured but unavailable, each request adds up to 5s timeout
Rate limiting is per-instance, not shared across instances

Distributed Consistency

Sealed sessions are stateless -- work across any number of instances
Revocation with file backend is per-instance (use Redis for multi-instance)
Rate limiting is per-instance (use API gateway for global limits)
Metrics are per-instance (use Prometheus for aggregation)

Customer Responsibilities

Responsibility	Why
Start NER sidecar if person/company/location needed	NER is optional
Delete vector store chunks after RAG delete	OGuardAI doesn't own vector stores
Rotate session keys on compromise	Key management is ops responsibility
Configure appropriate policy for use case	Policy selection affects protection level
Monitor health endpoint	Detect degraded mode early
Set `auth.mode` to non-dev for production	Dev mode bypasses auth
Set `session.secret` to a real secret	Default secret is warned about
Configure log retention	OGuardAI emits audit events, retention is infra

Failure Modes

Failure	OGuardAI Behavior
NER sidecar down	Falls back to builtin regex (person/company/location missed)
Invalid session blob	Clean error returned, no data leaked
Malformed LLM output	3-stage token repair attempted, unresolved tokens flagged
Policy not found	Default policy applied
Output guard catches new PII	Masked or blocked per config
Rate limit exceeded	HTTP 429 with Retry-After header
File too large	HTTP 400 rejection
Revoked entity in rehydrate	Returns [DELETED]

What Depends on NER Mode

Not all features work identically in builtin-only vs NER mode. This matrix clarifies:

Capability	Builtin (regex)	NER (GLiNER)	Notes
Email detection	Yes	Yes	Regex in both modes
Phone detection	Yes	Yes	Regex in both modes
SSN / IBAN / CC	Yes	Yes	Regex in both modes
IP / URL / DOB	Yes	Yes	Regex in both modes
Order / Ticket	Yes	Yes	Regex in both modes
Person name	No	Yes	Requires NER sidecar
Company name	No	Yes	Requires NER sidecar
Location	No	Yes	Requires NER sidecar
Address (structured)	Partial	Yes	Regex detects 7 country formats; NER detects any language
Entity linking	Limited	Full	Person-to-entity links need person detection (NER)
Cascade revocation	Limited	Full	Cascade from person to linked entities needs NER for person detection
Detection latency	Under 5ms	50-500ms	NER adds model inference time
Determinism	100%	Model-dependent	Same input may produce slightly different NER confidence scores across model versions

Bottom line: Builtin mode is fast and deterministic but misses person/company/location. NER mode catches more entity types but adds latency and model dependency.

What Depends on Policy

Feature	Default Policy	Strict PII Policy	Enterprise Policy
Tokenize email	Yes	Yes	Yes
Tokenize phone	Yes	Yes	Yes
Block SSN	No	Yes	Per-channel
Block IBAN	No	Yes	Per-channel
Pass through URL	Yes	No	Per-channel
Restore mode	Full	Masked	Per-channel
Output guard action	Mask	Block	Per-entity-type
Cascade revocation	Available	Available	Available
Shadow mode	Config-level	Config-level	Config-level

Bottom line: Policy controls what happens to detected entities. Detection is independent of policy -- it runs first and finds everything. Policy then decides: tokenize, block, or pass through.

Lifecycle of a Protected Value

At each stage, the value can only move forward (detected -> tokenized -> restored). It cannot be "un-tokenized" without the session blob, and it cannot be "un-revoked" once revoked.

Security Guarantees