Performance Benchmarks
Throughput, latency, and scaling measurements for OGuardAI
Measured on a single instance: 1 CPU core (AMD EPYC 7763), 512 MB RAM, Linux 6.1, Rust release build. All numbers collected with wrk2 (constant throughput) over 60-second runs, 100 concurrent connections.
Throughput (operations/sec)
| Operation | Builtin Only | Builtin + NER | Notes |
|---|---|---|---|
| Transform (single) | 2,800 | 45 | NER bound by sidecar round-trip |
| Rehydrate (single) | 12,000 | 12,000 | No external calls |
| Detect (single) | 3,200 | 50 | Same detection path as transform |
| Batch transform (10 items) | 1,400 batches/s | 30 batches/s | Per-batch, not per-item |
| Batch transform (50 items) | 350 batches/s | 15 batches/s | Per-batch, not per-item |
Latency by Operation (builtin-only mode)
| Operation | p50 | p95 | p99 |
|---|---|---|---|
| Transform | 0.8 ms | 2.1 ms | 4.2 ms |
| Rehydrate | 0.06 ms | 0.3 ms | 0.7 ms |
| Detect | 0.7 ms | 1.8 ms | 3.9 ms |
| Session seal | 0.05 ms | 0.08 ms | 0.12 ms |
| Session unseal | 0.05 ms | 0.08 ms | 0.12 ms |
Latency by Payload Size (builtin-only transform)
| Payload | Size | Entities | p50 | p95 | p99 |
|---|---|---|---|---|---|
| Small | Under 500 chars | 1-3 | 0.4 ms | 1.2 ms | 2.8 ms |
| Medium | 1-5 KB | 5-15 | 0.9 ms | 2.5 ms | 5.1 ms |
| Large | 10-100 KB | 20-80 | 3.2 ms | 8.4 ms | 14 ms |
Latency by Payload Size (builtin + NER transform)
| Payload | Size | Entities | p50 | p95 | p99 |
|---|---|---|---|---|---|
| Small | Under 500 chars | 1-5 | 18 ms | 65 ms | 130 ms |
| Medium | 1-5 KB | 8-25 | 45 ms | 120 ms | 190 ms |
| Large | 10-100 KB | 30-100 | 110 ms | 350 ms | 680 ms |
NER latency is dominated by the Python sidecar (GLiNER model inference + network round-trip). Rehydrate latency is identical in both modes since it performs local string replacement only.
Builtin vs. Builtin + NER: Impact Summary
| Metric | Builtin Only | Builtin + NER | Overhead Factor |
|---|---|---|---|
| p50 transform | 0.8 ms | 45 ms | ~56x |
| p99 transform | 4.2 ms | 190 ms | ~45x |
| Throughput | 2,800 ops/s | 45 ops/s | ~62x fewer |
| Entity types | 15 | 18 | +person, company, location |
The overhead comes entirely from the NER sidecar. If NER is configured but unavailable, each request incurs a 5-second timeout before falling back to builtin-only detection.
Horizontal Scaling
| Instances | Builtin Throughput | Builtin + NER Throughput |
|---|---|---|
| 1 | 2,800 ops/s | 45 ops/s |
| 2 | 5,500 ops/s | 88 ops/s |
| 4 | 11,000 ops/s | 170 ops/s |
| 8 | 22,000 ops/s | 340 ops/s |
Scaling is near-linear. Each instance is stateless (sealed session blobs travel with requests). NER scaling requires proportional NER sidecar instances.
Memory Footprint
| Component | Memory |
|---|---|
| Base server process | 18 MB |
| Per active request overhead | ~50 KB |
| Per entity in session blob | ~200 bytes |
| Session blob (100 entities) | ~20 KB |
| Session blob (1,000 entities) | ~200 KB |
| Regex pattern cache (15 builtin types) | ~2 MB |
| Peak under 100 concurrent requests | ~35 MB |
The Rust runtime has no garbage collector pauses. Memory usage is stable under sustained load.
Methodology
- Tool: wrk2 with constant-throughput mode, lua scripts for POST payloads.
- Payloads: Synthetic text with realistic PII density (emails, phones, SSNs, IBANs).
- Warm-up: 10-second warm-up discarded before measurement.
- NER sidecar: Single GLiNER instance on same host, 1 CPU core, 1 GB RAM.
- Repeated: Each benchmark run 3 times; median values reported.
- No sampling: All requests measured, not sampled.
Numbers will vary with hardware, payload composition, and NER model size. Use these as
baseline expectations for capacity planning. Run the included benchmark suite
(cargo bench -p guardai-perf) against your own infrastructure for production estimates.