OGuardAI
Security

Security Guarantees

What OGuardAI guarantees, what it does not guarantee, and what remains the customer's responsibility

This document states explicitly what OGuardAI guarantees, what it does not guarantee, and what remains the customer's responsibility.

Canonical Terminology

These terms have precise meanings throughout OGuardAI. If you encounter them in APIs, configs, logs, or documentation, they mean exactly this:

TermDefinition
TokenizedA raw PII value has been replaced by a semantic token {{type:id}} in the output text. The original value is stored in the encrypted session blob. Tokenization is deterministic: the same value always produces the same token within a session.
RestorableA tokenized value CAN be restored to its original form using the session state blob. Restorability depends on: (1) the session blob being valid and non-expired, (2) the entity not being revoked, (3) the policy allowing restore for that entity type and output channel. A value that was tokenized is restorable until its session expires or the entity is revoked.
RevokedA value has been permanently marked as non-restorable. During rehydrate, revoked entities return [DELETED] instead of the original value. Revocation is persistent (survives server restart) and uses HMAC-SHA-256 hashing -- no raw PII is stored in the revocation table. Revoking a person cascades to all linked entities (email, phone, address) via belongs_to relationships.
DeletedIn OGuardAI's context, deletion means the original value is no longer recoverable by any means. This is achieved by: (1) revoking the entity, (2) allowing the session blob to expire, or (3) dropping the session secret. OGuardAI does NOT delete data in external systems (vector stores, databases, log sinks) -- that is the integrator's responsibility.
BlockedThe policy engine has determined that this entity type is not allowed in the given context. Blocked entities are removed from the output entirely (not tokenized, not passed through). The entities_blocked counter tracks this. In the output guard context, "blocked" means the entire response is rejected because new high-sensitivity PII was detected.
MaskedA value has been replaced with a type label like [EMAIL] or [SSN]. Masking is irreversible -- the original value cannot be recovered from the mask. Masking occurs in two places: (1) as a restore mode (masked), where the session value is intentionally obscured, and (2) in the output guard, where newly detected PII is replaced with type labels.
RedactedSynonymous with "masked" in OGuardAI. We prefer "masked" in API surfaces and "redacted" only in human-facing documentation. They are functionally identical.
DetectedAn entity span has been identified by the detection engine (regex, NER, or both). Detection does not imply tokenization -- the policy engine decides what happens to each detected entity (tokenize, block, or pass through).
Passed throughThe policy engine has determined that this entity type is allowed in the given context. The raw value remains in the output unchanged. This is a deliberate policy decision, not a detection failure.
Session stateAn AES-256-GCM encrypted blob containing the token map (token-to-original-value mappings), session metadata, and policy reference. The blob is opaque to clients and LLMs. It is the ONLY artifact that can restore tokenized values.
Trace IDA UUID v4 identifier that correlates all operations in a request lifecycle: transform, proxy pass, tool calls, rehydrate, output guard, and revocation. Client-supplied or auto-generated. Use the same trace_id across transform and rehydrate to enable full incident reconstruction.

What OGuardAI GUARANTEES

Data Protection

  • Raw PII values NEVER appear in safe_text output (verified by 1,800+ unit tests)
  • Tokenized text uses ONLY the canonical {{type:id}} format
  • Sealed session blobs are encrypted with AES-256-GCM (authenticated encryption)
  • Tampered session blobs are always rejected (cryptographic verification)
  • Expired sessions produce clean errors, never data leaks

Detection

  • Builtin mode (regex): 15 entity types detected deterministically, same input = same output
  • NER mode (GLiNER): 18 entity types including person/company/location, detection quality depends on model
  • Both modes: detection is applied to ALL string content in the configured scan scope

Restore

  • full restore is byte-for-byte identical to the original value
  • Each of the 6 restore modes produces predictable, documented output
  • Channel-specific restore rules are applied deterministically per policy

Revocation

  • Revoked entity values ALWAYS return [DELETED] during rehydrate
  • Cascade revocation: revoking a person suppresses ALL linked entities (email, phone, address)
  • Revocation is persistent (file or Redis backend)
  • Revocation uses HMAC-SHA-256 -- no raw PII stored in the revocation table

Revocation Contract

The following are the canonical, binding guarantees for OGuardAI's revocation system:

  1. Revocation affects FUTURE rehydrate calls only -- outputs already delivered to end users or downstream systems cannot be clawed back.
  2. Sealed session blobs remain decryptable after revocation, but any revoked value resolves to [DELETED] instead of the original.
  3. Revoking a person entity cascades to every entity linked via belongs_to relationships (email, phone, address) -- all linked entities also resolve to [DELETED].
  4. Multiple entities can be revoked in a single API call (bulk revoke).
  5. Revocation state survives server restart when using the file backend and is shared across instances when using the Redis backend.
  6. Vector stores, external databases, and application caches must still delete their own copies of data -- OGuardAI cannot reach into external systems.
  7. Revocation is irreversible -- there is no "un-revoke" operation.
  8. The revocation table stores only HMAC-SHA-256 hashes of entity values -- no raw PII is ever stored in the revocation table itself.

Session Security

  • Cross-tenant session access is always rejected
  • Key rotation instantly invalidates all existing sealed session blobs
  • Session TTL is enforced -- expired blobs cannot be unsealed

Policy

  • Policy rules are evaluated deterministically for every entity
  • Policy inheritance resolves child > parent > default
  • Policy integrity can be verified via HMAC signatures

What OGuardAI DOES NOT Guarantee

Detection Completeness

  • No detection system catches 100% of PII in all contexts
  • Person/company/location detection REQUIRES the Python NER sidecar
  • In builtin-only mode, person names, company names, and locations are NOT detected
  • OCR text extraction is best-effort -- noisy scans may produce detection gaps
  • Custom or domain-specific entity types beyond the 15 built-in types (18 with NER) are not detected

External System Deletion

  • OGuardAI does NOT control vector stores, external databases, or log sinks
  • RAG chunk deletion in vector stores is the application's responsibility
  • Log retention and purging is managed by the logging infrastructure
  • OGuardAI provides guidance and signals, but cannot enforce external cleanup

Provider Behavior

  • LLM output quality depends on the provider (OpenAI, Anthropic, etc.)
  • Token damage patterns vary by provider and model version
  • Token repair is best-effort with 3-stage pipeline (strict -> repair -> fuzzy)
  • Hallucinated tokens are flagged as unresolved, never fabricated

Performance Under Load

  • Latency depends on detector mode, payload size, and NER sidecar availability
  • NER mode adds 50-500ms per request depending on text length
  • If NER sidecar is configured but unavailable, each request adds up to 5s timeout
  • Rate limiting is per-instance, not shared across instances

Distributed Consistency

  • Sealed sessions are stateless -- work across any number of instances
  • Revocation with file backend is per-instance (use Redis for multi-instance)
  • Rate limiting is per-instance (use API gateway for global limits)
  • Metrics are per-instance (use Prometheus for aggregation)

Customer Responsibilities

ResponsibilityWhy
Start NER sidecar if person/company/location neededNER is optional
Delete vector store chunks after RAG deleteOGuardAI doesn't own vector stores
Rotate session keys on compromiseKey management is ops responsibility
Configure appropriate policy for use casePolicy selection affects protection level
Monitor health endpointDetect degraded mode early
Set auth.mode to non-dev for productionDev mode bypasses auth
Set session.secret to a real secretDefault secret is warned about
Configure log retentionOGuardAI emits audit events, retention is infra

Failure Modes

FailureOGuardAI Behavior
NER sidecar downFalls back to builtin regex (person/company/location missed)
Invalid session blobClean error returned, no data leaked
Malformed LLM output3-stage token repair attempted, unresolved tokens flagged
Policy not foundDefault policy applied
Output guard catches new PIIMasked or blocked per config
Rate limit exceededHTTP 429 with Retry-After header
File too largeHTTP 400 rejection
Revoked entity in rehydrateReturns [DELETED]

What Depends on NER Mode

Not all features work identically in builtin-only vs NER mode. This matrix clarifies:

CapabilityBuiltin (regex)NER (GLiNER)Notes
Email detectionYesYesRegex in both modes
Phone detectionYesYesRegex in both modes
SSN / IBAN / CCYesYesRegex in both modes
IP / URL / DOBYesYesRegex in both modes
Order / TicketYesYesRegex in both modes
Person nameNoYesRequires NER sidecar
Company nameNoYesRequires NER sidecar
LocationNoYesRequires NER sidecar
Address (structured)PartialYesRegex detects 7 country formats; NER detects any language
Entity linkingLimitedFullPerson-to-entity links need person detection (NER)
Cascade revocationLimitedFullCascade from person to linked entities needs NER for person detection
Detection latencyUnder 5ms50-500msNER adds model inference time
Determinism100%Model-dependentSame input may produce slightly different NER confidence scores across model versions

Bottom line: Builtin mode is fast and deterministic but misses person/company/location. NER mode catches more entity types but adds latency and model dependency.

What Depends on Policy

FeatureDefault PolicyStrict PII PolicyEnterprise Policy
Tokenize emailYesYesYes
Tokenize phoneYesYesYes
Block SSNNoYesPer-channel
Block IBANNoYesPer-channel
Pass through URLYesNoPer-channel
Restore modeFullMaskedPer-channel
Output guard actionMaskBlockPer-entity-type
Cascade revocationAvailableAvailableAvailable
Shadow modeConfig-levelConfig-levelConfig-level

Bottom line: Policy controls what happens to detected entities. Detection is independent of policy -- it runs first and finds everything. Policy then decides: tokenize, block, or pass through.

Lifecycle of a Protected Value

At each stage, the value can only move forward (detected -> tokenized -> restored). It cannot be "un-tokenized" without the session blob, and it cannot be "un-revoked" once revoked.