Guides
Agent Integration
Protect PII across multi-step agent workflows with OGuardAI
The Problem
Agent frameworks (LangGraph, CrewAI, AutoGen) run multi-step workflows:
- Read user input (contains PII)
- Call LLM (PII leaks to provider)
- Call tools (PII leaks to external services)
- Store in memory (PII persists unprotected)
- Generate response (PII in output)
Each step is a leak point. Agents amplify risk exponentially.
The Pattern: Wrap Every Boundary
User Input -> [OGuardAI Transform] -> Safe Input
|
Agent Loop:
LLM Call (sees tokens only)
Tool Call [OGuardAI Transform arguments]
Tool Result [OGuardAI Transform result]
Memory Store (tokens only)
|
Safe Output -> [OGuardAI Rehydrate] -> Final OutputExample: LangGraph
from guardai_sdk import OGuardAIClient
guardai = OGuardAIClient("http://localhost:3000")
def protected_node(state):
# Transform input
result = guardai.transform(state["input"])
safe_input = result.safe_text
session = result.session_state
# Call LLM with safe input
llm_output = llm.invoke(safe_input)
# Rehydrate output
restored = guardai.rehydrate(llm_output, session)
return {"output": restored.restored_text}Example: CrewAI
from crewai import Agent, Task, Crew
from guardai_sdk import OGuardAIClient
guardai = OGuardAIClient("http://localhost:3000")
class OGuardAIAgent(Agent):
"""Agent wrapper that auto-protects PII at every boundary."""
def execute_task(self, task):
# Transform the task input
result = guardai.transform(task.description)
safe_task = Task(
description=result.safe_text,
expected_output=task.expected_output
)
# Execute with safe input
output = super().execute_task(safe_task)
# Rehydrate the output
restored = guardai.rehydrate(output, result.session_state)
return restored.restored_textExample: Multi-Step Tool Chain
# Step 1: User asks about a customer
user_input = "What is Julia Schneider's account balance?"
result = guardai.transform(user_input)
# safe_input: "What is `{{person:p_001}}`'s account balance?"
# Step 2: LLM decides to call a tool
llm_response = llm.invoke(result.safe_text)
# LLM output: "I'll look up `{{person:p_001}}`'s account."
# Step 3: Tool call - transform the arguments via JSON input
import json
tool_args = {"customer_name": "Julia Schneider"}
safe_args_result = guardai.transform(
json.dumps(tool_args), session_state=result.session_state
)
# safe_args contains tokenized JSON string
# Step 4: Tool returns data - transform the result
tool_result = crm.lookup(customer_name="Julia Schneider")
# tool_result: {"balance": 5000, "email": "julia@firma.de"}
safe_result = guardai.transform(
json.dumps(tool_result), session_state=result.session_state
)
# Step 5: LLM generates final answer with safe data
final = llm.invoke(f"The result is: {safe_result}")
# LLM output: "`{{person:p_001}}`'s balance is 5000. Contact: `{{email:e_001}}`"
# Step 6: Rehydrate only at the final output
restored = guardai.rehydrate(final, result.session_state)
# "Julia Schneider's balance is 5000. Contact: julia@firma.de"Key Rules
- Transform before every LLM call -- never send raw PII to a model
- Transform tool arguments -- tool calls may contain PII from earlier steps
- Transform tool results -- external tools return PII that re-enters the loop
- Store only tokenized text in memory -- agent memory should never hold raw PII
- Rehydrate only at the final output -- restore PII only when returning to the user
- Pass session state through the agent context -- all steps share the same session for entity identity
Anti-Patterns
Do NOT: Cache raw PII in agent memory
# BAD: raw PII in memory
memory.save("Julia Schneider's email is julia@firma.de")
# GOOD: tokenized text in memory
result = guardai.transform("Julia Schneider's email is julia@firma.de")
memory.save(result.safe_text)
# Saves: "`{{person:p_001}}`'s email is `{{email:e_001}}`"Do NOT: Rehydrate in the middle of an agent loop
# BAD: rehydrating mid-loop leaks PII back into the agent context
step1_output = llm.invoke(safe_input)
restored = guardai.rehydrate(step1_output, session) # PII is now raw again
step2_output = llm.invoke(restored) # PII sent to LLM again!
# GOOD: keep everything tokenized until the final output
step1_output = llm.invoke(safe_input)
step2_output = llm.invoke(step1_output) # still tokenized
final = guardai.rehydrate(step2_output, session) # restore only at the endDo NOT: Create separate sessions per step
# BAD: separate sessions lose entity identity
result1 = guardai.transform("Julia Schneider") # session A
result2 = guardai.transform("Julia Schneider") # session B (different token!)
# GOOD: accumulate into one session
result1 = guardai.transform("Julia Schneider")
result2 = guardai.transform("More about Julia", session_state=result1.session_state)
# Same person gets same token across both callsSession State Management
OGuardAI uses sealed session state (encrypted blobs) that travel with the request. This means:
- No server-side state -- the session is in the blob
- Portable -- pass the blob through any agent framework's state mechanism
- Secure -- AES-256-GCM encrypted, tamper-proof
- Accumulative -- each transform call can build on previous session state
# Session accumulation through agent steps
session = None
for step in agent_steps:
result = guardai.transform(step.input, session_state=session)
session = result.session_state # carry forward
step.safe_input = result.safe_text
# Final rehydration with accumulated session
final = guardai.rehydrate(agent_output, session)Framework-Specific Notes
LangGraph
- Store session state in the graph's state dict
- Use
protected_nodepattern at each node boundary - The session blob is just a string -- fits naturally in LangGraph state
CrewAI
- Wrap agents with a OGuardAI middleware
- Transform task descriptions before execution
- Rehydrate crew output after all agents complete
AutoGen
- Use OGuardAI in the message preprocessing hook
- Transform all messages before they reach the LLM
- Store session state in the conversation context
OpenAI Agents SDK
- Use the OGuardAI proxy as the base URL
- The proxy handles transform/rehydrate transparently
- No code changes needed in the agent definition