Protect attorney-client privilege and confidential case data when using AI for contract review, legal research, and document summarization

Protect attorney-client privilege, client identities, and confidential case data when using AI for contract review, legal research, and document summarization.

The Problem

Law firms and corporate legal departments are under pressure to adopt AI for contract review, due diligence, legal research, and document summarization. The productivity gains are substantial -- AI can reduce first-pass contract review time by 60-80%. But legal data carries unique sensitivity requirements that go beyond standard PII protection.

Attorney-client privilege demands that client identities, case details, and legal strategy remain confidential. Sending a contract containing party names, case references, and deal terms to a third-party LLM creates a privilege waiver risk. Even if the LLM provider's terms of service disclaim data use for training, the act of transmitting privileged information to a third party may be sufficient to challenge privilege in litigation.

Beyond privilege, legal documents contain a dense concentration of sensitive data: party names, counterparty identifiers, contract values, governing law clauses, and custom identifiers like matter numbers and court case IDs. Generic PII detection misses domain-specific patterns (matter numbers, case citations, deal codes), and naive redaction destroys the relational structure that makes legal analysis possible.

How OGuardAI Solves It

OGuardAI sits between your legal application and the AI model. Client-identifying information is tokenized with semantic metadata, preserving the document's logical structure while removing real identifiers. The AI model can analyze contract clauses, compare terms, and summarize obligations without knowing who the parties are.

Legal Application (DMS, CLM, eDiscovery)
    |
    v
OGuardAI Runtime (privileged data exists only here, transiently)
    |
    +---> Tokenized text (no client data) ---> LLM Provider
    +---> Encrypted session blob ------------> Your Application
    |
    v
Restored output (channel-specific)
    +---> Attorney review: full restore
    +---> Client report: formatted names
    +---> Court filing: redacted
    +---> Knowledge base: abstract identifiers

Detected Entity Types

Entity Type	Examples	Default Action
`person`	Client names, counterparty names, attorneys	Tokenize with role metadata
`email`	Attorney and client email addresses	Tokenize
`address`	Office addresses, registered agent addresses	Tokenize
`phone`	Direct lines, mobile numbers	Tokenize
`custom`*	Matter numbers, case IDs, deal codes	Tokenize (configurable patterns)
`company`*	Company names, firm names	Tokenize with role metadata
`date_of_birth`	Client DOB in estate/family law matters	Tokenize
`ssn`	Client SSN in tax/estate matters	Block

Entity types marked with * are custom types defined via policy rules, not built-in. See the Extending Entities guide for how to add custom types.

The custom entity type supports configurable regex patterns for domain-specific identifiers. Legal teams can define patterns for their matter numbering scheme (e.g., MTR-\d{4}-\d{6}), court case numbers (e.g., \d{2}-cv-\d{5}), or internal deal codes.

Example Policy

name: legal-privilege
version: "1.0"
description: "Attorney-client privilege protection for legal AI workflows"

rules:
  - entity_type: "person"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "company"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "email"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "address"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "phone"
    protection_level: 2
    action: "tokenize"
    conditions: []

  - entity_type: "iban"
    protection_level: 1
    action: "block"
    conditions: []

  - entity_type: "credit_card"
    protection_level: 1
    action: "block"
    conditions: []

  - entity_type: "ssn"
    protection_level: 1
    action: "block"
    conditions: []

  - entity_type: "passport"
    protection_level: 1
    action: "block"
    conditions: []

defaults:
  protection_level: 2
  action: "tokenize"
  restore_mode: "full"

channel_rules:
  attorney_review:
    person:       { restore_mode: full }
    email:        { restore_mode: full }
    address:      { restore_mode: full }
    phone:        { restore_mode: full }
    company:      { restore_mode: full }
    custom:       { restore_mode: full }
  client_report:
    person:       { restore_mode: formatted }
    email:        { restore_mode: masked }
    address:      { restore_mode: masked }
    phone:        { restore_mode: masked }
    company:      { restore_mode: full }
    custom:       { restore_mode: masked }
  knowledge_base:
    person:       { restore_mode: abstract }
    email:        { restore_mode: none }
    address:      { restore_mode: none }
    phone:        { restore_mode: none }
    company:      { restore_mode: abstract }
    custom:       { restore_mode: none }

The default restore_mode: full means that unless a specific output channel overrides it, all entities are restored to their original values. The knowledge_base channel overrides this to abstract for person and company names, replacing them with semantic descriptions like (party A) or (opposing counsel) so that analyses can be reused across matters without exposing client identities.

Example API Call

Transform a contract clause

curl -X POST http://localhost:3000/v1/transform \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key-here" \
  -d '{
    "input": "ASSET PURCHASE AGREEMENT\n\nThis Agreement is entered into as of January 15, 2026, by and between Meridian Technologies Inc., a Delaware corporation (\"Buyer\"), and Apex Digital Solutions LLC, an Oregon limited liability company (\"Seller\").\n\nMatter: MTR-2026-004817\n\nSection 4.2 Indemnification. Seller shall indemnify Buyer against all losses arising from breaches of representations in Section 3. The indemnification cap is USD 5,000,000. Claims must be submitted to Robert Langford (robert.langford@meridiantech.com) within 18 months of closing.\n\nGoverning Law: State of Delaware.",
    "policy": "legal-privilege"
  }'

Response (tokenized)

{
  "safe_text": "ASSET PURCHASE AGREEMENT\n\nThis Agreement is entered into as of January 15, 2026, by and between {{company:co_001}}, a Delaware corporation (\"Buyer\"), and {{company:co_002}}, an Oregon limited liability company (\"Seller\").\n\nMatter: {{custom:ct_001}}\n\nSection 4.2 Indemnification. Seller shall indemnify Buyer against all losses arising from breaches of representations in Section 3. The indemnification cap is USD 5,000,000. Claims must be submitted to {{person:p_001}} ({{email:e_001}}) within 18 months of closing.\n\nGoverning Law: State of Delaware.",
  "session_id": "01916c2a-5d3e-7000-8000-000000000003",
  "session_state": "eyJ2IjoxLCJzaWQiOi...",
  "entity_context": [
    { "token": "{{company:co_001}}", "type": "company", "role": "buyer" },
    { "token": "{{company:co_002}}", "type": "company", "role": "seller" },
    { "token": "{{person:p_001}}", "type": "person", "role": "contact" }
  ],
  "stats": { "entities_detected": 5, "entities_transformed": 5, "entities_blocked": 0 }
}

The contract structure, legal terms, indemnification cap, governing law, and temporal clauses pass through unchanged. The AI model can analyze the indemnification clause, compare it against market terms, and flag unusual provisions -- all without knowing who the parties are.

Rehydrate for the attorney

curl -X POST http://localhost:3000/v1/rehydrate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key-here" \
  -d '{
    "output": "<LLM-generated contract analysis with tokens>",
    "session_state": "eyJ2IjoxLCJzaWQiOi...",
    "output_channel": "attorney_review"
  }'

The attorney sees the full analysis with all party names, matter numbers, and contact details restored. The knowledge base version uses abstract identifiers so the analysis can be reused across matters without exposing client identities.

Compliance and Privilege Notes

Requirement	How OGuardAI Addresses It
Attorney-client privilege	Client identities and matter details never reach the LLM provider; privilege cannot be waived by transmission to a non-privileged third party
ABA Model Rule 1.6 (Confidentiality)	Reasonable measures to prevent disclosure: tokenization, AES-256-GCM encryption, session expiry, trust boundary enforcement
GDPR (client PII in EU matters)	PHI/PII tokenization satisfies data minimization (Art. 5(1)(c)); no persistent storage of personal data
Litigation hold compatibility	Session blobs can be preserved for litigation hold; token mappings are deterministic and reproducible within the session
Cross-border data restrictions	Tokenized text can cross jurisdictional boundaries without triggering data transfer restrictions, since it contains no personal data
Conflicts check	Abstract restore mode in the knowledge base prevents inadvertent conflicts -- attorneys cannot identify parties in anonymized precedent analyses

Ethical Considerations

Legal AI introduces specific ethical obligations. OGuardAI addresses the data protection dimension, but attorneys retain responsibility for:

Reviewing and verifying all AI-generated legal analysis before relying on it
Ensuring that AI use complies with applicable bar rules and court orders
Maintaining competence in understanding the technology's capabilities and limitations (ABA Model Rule 1.1, Comment 8)
Disclosing AI use to clients where required by jurisdiction or engagement terms

Customer Support Case Study -- Walkthrough of a support workflow with formal language handling
GDPR Compliance -- GDPR-specific documentation for legal teams handling EU client data
Compliance Controls Mapping -- HIPAA, GDPR, SOC 2, PCI DSS control mapping
Extending Entity Types -- How to add custom entity patterns for matter numbers and case IDs

Legal & Compliance