Healthcare & HIPAA
Protect PHI when using AI for clinical notes, patient summaries, and medical Q&A -- full HIPAA compliance without sacrificing AI quality
Protect patient health information (PHI) when using AI for clinical notes, patient summaries, and medical question answering -- while maintaining full HIPAA compliance.
The Problem
Healthcare organizations face a regulatory wall when adopting AI. Clinical workflows generate enormous volumes of text -- intake forms, discharge summaries, referral letters, insurance pre-authorizations -- that benefit from AI-assisted drafting, summarization, and coding. But every document contains protected health information: patient names, medical record numbers, health plan IDs, dates of birth, addresses, and phone numbers.
HIPAA's Security Rule and Privacy Rule impose strict requirements on how PHI is handled. Sending raw PHI to a third-party LLM provider creates a compliance gap that most organizations cannot close with a Business Associate Agreement alone. The risk calculus is straightforward: a single breach involving PHI triggers mandatory notification, OCR investigation, and potential penalties of up to $2.1 million per violation category per year.
The alternatives are worse. On-premise models are expensive to operate and lag behind commercial models in quality. Manual redaction is slow and error-prone. Regex stripping destroys clinical context -- removing all names from a multi-patient summary makes the output unusable.
How OGuardAI Solves It
OGuardAI sits between your clinical application and the AI model. PHI is detected, replaced with semantic tokens that carry safe metadata, and the tokenized text is sent to the LLM. The model never sees real identifiers but retains enough context (age range, gender, provider name) to generate clinically useful output. After the model responds, OGuardAI restores the original values based on the output channel's policy.
EHR / Clinical App
|
v
OGuardAI Runtime (PHI exists only here, transiently)
|
+---> Tokenized text (no PHI) ---> LLM Provider
+---> Encrypted session blob ----> Your Application
|
v
Restored output (channel-specific)
+---> Physician review: full restore
+---> Patient portal: masked identifiers
+---> Audit log: no real valuesDetected Entity Types
| Entity Type | Examples | Default Action |
|---|---|---|
person | Patient names, provider names | Tokenize with gender/language metadata |
health_id | Medical record numbers, health plan IDs | Tokenize |
date_of_birth | DOB fields, ages | Tokenize with age_range metadata |
ssn | Social Security numbers | Block (never reaches the model) |
address | Street addresses, ZIP codes | Tokenize |
phone | Phone and fax numbers | Tokenize |
email | Patient and provider email addresses | Tokenize |
insurance_id* | Payer IDs, group numbers | Tokenize |
Entity types marked with * are custom types defined via policy rules, not built-in. See the Extending Entities guide for how to add custom types.
Example Policy
name: healthcare-hipaa
version: "1.0"
description: "HIPAA-compliant PHI protection for clinical AI workflows"
rules:
- entity_type: "person"
protection_level: 2
action: "tokenize"
conditions: []
- entity_type: "health_id"
protection_level: 2
action: "tokenize"
conditions: []
- entity_type: "date_of_birth"
protection_level: 2
action: "tokenize"
conditions: []
- entity_type: "ssn"
protection_level: 1
action: "block"
conditions: []
- entity_type: "address"
protection_level: 3
action: "abstract"
conditions: []
- entity_type: "phone"
protection_level: 2
action: "tokenize"
conditions: []
- entity_type: "email"
protection_level: 2
action: "tokenize"
conditions: []
- entity_type: "insurance_id"
protection_level: 2
action: "tokenize"
conditions: []
defaults:
protection_level: 2
action: "tokenize"
restore_mode: "partial"
channel_rules:
physician_review:
person: { restore_mode: full }
health_id: { restore_mode: full }
date_of_birth: { restore_mode: full }
insurance_id: { restore_mode: full }
phone: { restore_mode: full }
email: { restore_mode: full }
address: { restore_mode: full }
patient_portal:
person: { restore_mode: full }
health_id: { restore_mode: masked }
date_of_birth: { restore_mode: masked }
insurance_id: { restore_mode: masked }
phone: { restore_mode: masked }
email: { restore_mode: masked }
address: { restore_mode: masked }
audit_log:
person: { restore_mode: none }
health_id: { restore_mode: none }
date_of_birth: { restore_mode: none }
insurance_id: { restore_mode: none }
phone: { restore_mode: none }
email: { restore_mode: none }
address: { restore_mode: none }Example API Call
Transform a clinical note
curl -X POST http://localhost:3000/v1/transform \
-H "Content-Type: application/json" \
-H "X-API-Key: your-key-here" \
-d '{
"input": "Patient: Sarah Chen, MRN: MRN-0048271, DOB: 06/14/1978. Presenting with persistent lower back pain radiating to left leg for 3 weeks. Previous lumbar MRI (2024) showed L4-L5 disc herniation. Contact: (415) 555-0193, sarah.chen@email.com. Insurance: Aetna ID AET-992041.",
"policy": "healthcare-hipaa"
}'Response (tokenized)
{
"safe_text": "Patient: {{person:p_001}}, MRN: {{health_id:h_001}}, DOB: {{date_of_birth:d_001}}. Presenting with persistent lower back pain radiating to left leg for 3 weeks. Previous lumbar MRI (2024) showed L4-L5 disc herniation. Contact: {{phone:ph_001}}, {{email:e_001}}. Insurance: Aetna ID {{insurance_id:i_001}}.",
"session_id": "01916a3e-7b2c-7000-8000-000000000001",
"session_state": "eyJ2IjoxLCJzaWQiOi...",
"entity_context": [
{ "token": "{{person:p_001}}", "type": "person", "gender": "female", "language": "en" },
{ "token": "{{date_of_birth:d_001}}", "type": "date_of_birth", "age_range": "45-50" }
],
"stats": { "entities_detected": 6, "entities_transformed": 6, "entities_blocked": 0 }
}The clinical details -- symptoms, imaging history, diagnosis -- pass through unchanged. The LLM receives enough context to generate a useful clinical summary without ever seeing the patient's real name, MRN, or contact information.
Rehydrate for the physician
curl -X POST http://localhost:3000/v1/rehydrate \
-H "Content-Type: application/json" \
-H "X-API-Key: your-key-here" \
-d '{
"output": "<LLM-generated summary with tokens>",
"session_state": "eyJ2IjoxLCJzaWQiOi...",
"output_channel": "physician_review"
}'The physician sees the complete summary with all identifiers restored. The patient portal shows masked MRN and contact details. The audit log contains no real PHI.
HIPAA Compliance Notes
| HIPAA Requirement | How OGuardAI Addresses It |
|---|---|
| Minimum Necessary (164.502(b)) | Policy engine enforces entity-level data minimization -- only the entity types needed for the AI task are tokenized; the rest are blocked |
| Access Controls (164.312(a)) | Output channels enforce role-based restore modes -- physicians see full data, patients see masked data, logs see nothing |
| Encryption (164.312(a)(2)(iv)) | Session state is AES-256-GCM encrypted; PHI exists only transiently in RAM during processing |
| Audit Controls (164.312(b)) | Every transform and rehydrate operation emits structured audit events with entity types and counts -- never raw PHI |
| Transmission Security (164.312(e)) | PHI never crosses the trust boundary to the LLM provider; only tokenized text is transmitted |
| BAA Simplification | Because raw PHI never reaches the LLM provider, the BAA scope with that provider is simplified or eliminated |
OGuardAI does not replace a comprehensive HIPAA compliance program. It is one technical control within the Covered Entity's broader administrative, physical, and technical safeguard framework. Consult your compliance officer and legal counsel for your specific deployment.
Related Resources
- Healthcare Case Study -- Step-by-step walkthrough of a patient intake workflow
- GDPR Compliance -- GDPR-specific compliance documentation
- Compliance Controls Mapping -- HIPAA, GDPR, SOC 2, PCI DSS article-by-article mapping
- Policy Authoring Guide -- How to write and customize policies