Protect PHI when using AI for clinical notes, patient summaries, and medical Q&A -- full HIPAA compliance without sacrificing AI quality

Protect patient health information (PHI) when using AI for clinical notes, patient summaries, and medical question answering -- while maintaining full HIPAA compliance.

The Problem

Healthcare organizations face a regulatory wall when adopting AI. Clinical workflows generate enormous volumes of text -- intake forms, discharge summaries, referral letters, insurance pre-authorizations -- that benefit from AI-assisted drafting, summarization, and coding. But every document contains protected health information: patient names, medical record numbers, health plan IDs, dates of birth, addresses, and phone numbers.

HIPAA's Security Rule and Privacy Rule impose strict requirements on how PHI is handled. Sending raw PHI to a third-party LLM provider creates a compliance gap that most organizations cannot close with a Business Associate Agreement alone. The risk calculus is straightforward: a single breach involving PHI triggers mandatory notification, OCR investigation, and potential penalties of up to $2.1 million per violation category per year.

The alternatives are worse. On-premise models are expensive to operate and lag behind commercial models in quality. Manual redaction is slow and error-prone. Regex stripping destroys clinical context -- removing all names from a multi-patient summary makes the output unusable.

How OGuardAI Solves It

OGuardAI sits between your clinical application and the AI model. PHI is detected, replaced with semantic tokens that carry safe metadata, and the tokenized text is sent to the LLM. The model never sees real identifiers but retains enough context (age range, gender, provider name) to generate clinically useful output. After the model responds, OGuardAI restores the original values based on the output channel's policy.

EHR / Clinical App
    |
    v
OGuardAI Runtime (PHI exists only here, transiently)
    |
    +---> Tokenized text (no PHI) ---> LLM Provider
    +---> Encrypted session blob ----> Your Application
    |
    v
Restored output (channel-specific)
    +---> Physician review: full restore
    +---> Patient portal: masked identifiers
    +---> Audit log: no real values

Detected Entity Types

Entity Type	Examples	Default Action
`person`	Patient names, provider names	Tokenize with gender/language metadata
`health_id`	Medical record numbers, health plan IDs	Tokenize
`date_of_birth`	DOB fields, ages	Tokenize with `age_range` metadata
`ssn`	Social Security numbers	Block (never reaches the model)
`address`	Street addresses, ZIP codes	Tokenize
`phone`	Phone and fax numbers	Tokenize
`email`	Patient and provider email addresses	Tokenize
`insurance_id`*	Payer IDs, group numbers	Tokenize

Entity types marked with * are custom types defined via policy rules, not built-in. See the Extending Entities guide for how to add custom types.

Example Policy

name: healthcare-hipaa
version: "1.0"
description: "HIPAA-compliant PHI protection for clinical AI workflows"

rules:
  - entity_type: "person"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "health_id"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "date_of_birth"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "ssn"
    protection_level: 1
    action: "block"
    conditions: []

  - entity_type: "address"
    protection_level: 3
    action: "abstract"
    conditions: []

  - entity_type: "phone"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "email"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "insurance_id"
    protection_level: 2
    action: "tokenize"
    conditions: []

defaults:
  protection_level: 2
  action: "tokenize"
  restore_mode: "partial"

channel_rules:
  physician_review:
    person:       { restore_mode: full }
    health_id:    { restore_mode: full }
    date_of_birth: { restore_mode: full }
    insurance_id: { restore_mode: full }
    phone:        { restore_mode: full }
    email:        { restore_mode: full }
    address:      { restore_mode: full }
  patient_portal:
    person:       { restore_mode: full }
    health_id:    { restore_mode: masked }
    date_of_birth: { restore_mode: masked }
    insurance_id: { restore_mode: masked }
    phone:        { restore_mode: masked }
    email:        { restore_mode: masked }
    address:      { restore_mode: masked }
  audit_log:
    person:       { restore_mode: none }
    health_id:    { restore_mode: none }
    date_of_birth: { restore_mode: none }
    insurance_id: { restore_mode: none }
    phone:        { restore_mode: none }
    email:        { restore_mode: none }
    address:      { restore_mode: none }

Example API Call

Transform a clinical note

curl -X POST http://localhost:3000/v1/transform \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key-here" \
  -d '{
    "input": "Patient: Sarah Chen, MRN: MRN-0048271, DOB: 06/14/1978. Presenting with persistent lower back pain radiating to left leg for 3 weeks. Previous lumbar MRI (2024) showed L4-L5 disc herniation. Contact: (415) 555-0193, sarah.chen@email.com. Insurance: Aetna ID AET-992041.",
    "policy": "healthcare-hipaa"
  }'

Response (tokenized)

{
  "safe_text": "Patient: {{person:p_001}}, MRN: {{health_id:h_001}}, DOB: {{date_of_birth:d_001}}. Presenting with persistent lower back pain radiating to left leg for 3 weeks. Previous lumbar MRI (2024) showed L4-L5 disc herniation. Contact: {{phone:ph_001}}, {{email:e_001}}. Insurance: Aetna ID {{insurance_id:i_001}}.",
  "session_id": "01916a3e-7b2c-7000-8000-000000000001",
  "session_state": "eyJ2IjoxLCJzaWQiOi...",
  "entity_context": [
    { "token": "{{person:p_001}}", "type": "person", "gender": "female", "language": "en" },
    { "token": "{{date_of_birth:d_001}}", "type": "date_of_birth", "age_range": "45-50" }
  ],
  "stats": { "entities_detected": 6, "entities_transformed": 6, "entities_blocked": 0 }
}

The clinical details -- symptoms, imaging history, diagnosis -- pass through unchanged. The LLM receives enough context to generate a useful clinical summary without ever seeing the patient's real name, MRN, or contact information.

Rehydrate for the physician

curl -X POST http://localhost:3000/v1/rehydrate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key-here" \
  -d '{
    "output": "<LLM-generated summary with tokens>",
    "session_state": "eyJ2IjoxLCJzaWQiOi...",
    "output_channel": "physician_review"
  }'

The physician sees the complete summary with all identifiers restored. The patient portal shows masked MRN and contact details. The audit log contains no real PHI.

HIPAA Compliance Notes

HIPAA Requirement	How OGuardAI Addresses It
Minimum Necessary (164.502(b))	Policy engine enforces entity-level data minimization -- only the entity types needed for the AI task are tokenized; the rest are blocked
Access Controls (164.312(a))	Output channels enforce role-based restore modes -- physicians see full data, patients see masked data, logs see nothing
Encryption (164.312(a)(2)(iv))	Session state is AES-256-GCM encrypted; PHI exists only transiently in RAM during processing
Audit Controls (164.312(b))	Every transform and rehydrate operation emits structured audit events with entity types and counts -- never raw PHI
Transmission Security (164.312(e))	PHI never crosses the trust boundary to the LLM provider; only tokenized text is transmitted
BAA Simplification	Because raw PHI never reaches the LLM provider, the BAA scope with that provider is simplified or eliminated

OGuardAI does not replace a comprehensive HIPAA compliance program. It is one technical control within the Covered Entity's broader administrative, physical, and technical safeguard framework. Consult your compliance officer and legal counsel for your specific deployment.

Healthcare Case Study -- Step-by-step walkthrough of a patient intake workflow
GDPR Compliance -- GDPR-specific compliance documentation
Compliance Controls Mapping -- HIPAA, GDPR, SOC 2, PCI DSS article-by-article mapping
Policy Authoring Guide -- How to write and customize policies

Healthcare & HIPAA