OGuardAI
Guides

OpenAI Drop-In Proxy

Change one URL to get enterprise PII protection with zero code changes

The Killer Use Case

Change one URL. Get enterprise PII protection. Zero code changes.

Before (unprotected)

client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Customer Julia Schneider (julia@firma.de) needs help"}]
)
# Julia's name and email sent directly to OpenAI

After (protected by OGuardAI proxy)

The proxy runs on port 8081 by default (separate from the OGuardAI server on port 3000 or 8080). Point your SDK's base_url at the proxy:

client = OpenAI(
    api_key="sk-...",
    base_url="http://localhost:8081/v1"  # Only change: point to OGuardAI proxy (port 8081)
)
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Customer Julia Schneider (julia@firma.de) needs help"}]
)
# OpenAI sees: "Customer `{{person:p_001}}` (`{{email:e_001}}`) needs help"
# Response automatically restored: "Dear Frau Julia Schneider..."

How It Works

  1. Your app sends request to OGuardAI proxy (port 8081)
  2. Proxy masks PII in all user/assistant messages
  3. Proxy forwards to real OpenAI API
  4. OpenAI responds with tokens (e.g., "Dear {{person:p_001}}")
  5. Proxy restores tokens to real values
  6. Your app receives the response with real names

Setup

# Start OGuardAI proxy
oguardai-proxy --target https://api.openai.com --policy default --port 8081

# Or with Docker
docker run -p 8081:8081 ghcr.io/oronts/oronts-guardai/oguardai-proxy:latest \
  --target https://api.openai.com \
  --policy default

Streaming Support

Streaming works transparently:

stream = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    stream=True  # Streaming works through proxy
)
for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

What Gets Protected

Message RoleScanned?
systemNo (preserves your prompts)
userYes
assistantYes
tool callsYes (function arguments)
tool resultsYes

Anthropic Support

Same pattern for Anthropic:

client = Anthropic(
    api_key="sk-ant-...",
    base_url="http://localhost:8081"
)

Configuration

The proxy respects all OGuardAI policy settings:

# Use a specific policy
oguardai-proxy --target https://api.openai.com --policy strict-pii --port 8081

# With German formal restore
oguardai-proxy --target https://api.openai.com --policy german-support --port 8081

How Token Restoration Works

When the LLM generates output containing tokens like {{person:p_001}}, the proxy automatically restores them using the session state that was created during the request transformation. The restore mode (full, partial, masked, formatted, abstract) is controlled by the policy configuration.

For example, with the german-support policy:

  • Input: "Customer Julia Schneider (julia@firma.de) needs help"
  • To OpenAI: "Customer {{person:p_001}} ({{email:e_001}}) needs help"
  • From OpenAI: "Dear {{person:p_001}}, we have received your request..."
  • To your app: "Dear Frau Julia Schneider, we have received your request..."

Architecture

Your App
  |
  | (standard OpenAI SDK calls)
  v
OGuardAI Proxy (port 8081)
  |
  | 1. Transform: mask PII in request
  | 2. Forward: send safe request to OpenAI
  | 3. Rehydrate: restore tokens in response
  |
  v
OpenAI API (never sees real PII)

Limitations

  • The proxy adds a small latency overhead for PII detection and token restoration
  • System messages are not scanned (by design, to preserve your prompts)
  • The proxy requires network access to both your app and the target API
  • Session state is per-request; multi-turn conversations need session continuity (the proxy handles this automatically via sealed session blobs)

Troubleshooting

Tokens not being restored: Check that the LLM is faithfully reproducing the token format {{type:id}}. OGuardAI includes a 3-stage token repair pipeline (strict, repair, fuzzy) that handles common LLM token mangling.

Performance: The proxy adds typically under 10ms overhead for PII detection on short inputs. For large inputs, consider using the chunking API directly.