OGuardAI

OGuardAI Documentation

Semantic data protection runtime for AI systems

What is OGuardAI?

OGuardAI is a semantic data protection and transformation runtime for AI systems. It detects sensitive entities in user input, replaces them with semantic tokens {{type:id}} (e.g., {{email:e7a3}}), preserves safe metadata so LLMs retain context without seeing real data, and deterministically restores original values in the LLM output afterward.

Core Flow

DetectTokenizeTransformLLMRehydrate

Why OGuardAI Exists

AI adoption in enterprises is blocked by a false choice: use AI unsafely with raw sensitive data, or strip data until AI becomes useless. OGuardAI eliminates this choice.

The core insight: replace raw values with typed semantic tokens that carry enough context (gender, formality, language) for LLMs to generate correct, personalized output — then deterministically restore only what policy allows.

Architecture Difference

Traditional PII Tools (Presidio / AWS Comprehend)

Input → Detect → Mask → (data destroyed)

The original values are gone. The masked text goes to the LLM, and the LLM response cannot reference any real data.

OGuardAI

Input → Detect → Tokenize → Transform → [LLM] → Restore

Sensitive values are replaced with typed semantic tokens. The tokens carry safe metadata so LLMs generate contextually correct output. After the LLM responds, OGuardAI deterministically restores the original values based on policy.

Key Scenarios

  • Customer support: German formal replies with gender-aware restoration
  • RAG: Sanitize document ingestion and query-time context
  • CRM copilot: Draft emails and summaries without exposing customer data
  • Agentic workflows: Govern multi-step tool calls with per-step sanitization
  • Document AI: Summarize email threads, contracts, complaints safely
  • Regulated teams: Enable AI in legal, HR, finance, healthcare departments

Trust Boundary

Raw PII exists only inside the OGuardAI runtime (Trusted Zone). LLMs, external tools, logs, and vector stores (Untrusted Zone) only ever see tokenized text and safe metadata.