OGuardAI
Operations

Performance Benchmarks

Throughput, latency, and scaling measurements for OGuardAI

Measured on a single instance: 1 CPU core (AMD EPYC 7763), 512 MB RAM, Linux 6.1, Rust release build. All numbers collected with wrk2 (constant throughput) over 60-second runs, 100 concurrent connections.

Throughput (operations/sec)

OperationBuiltin OnlyBuiltin + NERNotes
Transform (single)2,80045NER bound by sidecar round-trip
Rehydrate (single)12,00012,000No external calls
Detect (single)3,20050Same detection path as transform
Batch transform (10 items)1,400 batches/s30 batches/sPer-batch, not per-item
Batch transform (50 items)350 batches/s15 batches/sPer-batch, not per-item

Latency by Operation (builtin-only mode)

Operationp50p95p99
Transform0.8 ms2.1 ms4.2 ms
Rehydrate0.06 ms0.3 ms0.7 ms
Detect0.7 ms1.8 ms3.9 ms
Session seal0.05 ms0.08 ms0.12 ms
Session unseal0.05 ms0.08 ms0.12 ms

Latency by Payload Size (builtin-only transform)

PayloadSizeEntitiesp50p95p99
SmallUnder 500 chars1-30.4 ms1.2 ms2.8 ms
Medium1-5 KB5-150.9 ms2.5 ms5.1 ms
Large10-100 KB20-803.2 ms8.4 ms14 ms

Latency by Payload Size (builtin + NER transform)

PayloadSizeEntitiesp50p95p99
SmallUnder 500 chars1-518 ms65 ms130 ms
Medium1-5 KB8-2545 ms120 ms190 ms
Large10-100 KB30-100110 ms350 ms680 ms

NER latency is dominated by the Python sidecar (GLiNER model inference + network round-trip). Rehydrate latency is identical in both modes since it performs local string replacement only.

Builtin vs. Builtin + NER: Impact Summary

MetricBuiltin OnlyBuiltin + NEROverhead Factor
p50 transform0.8 ms45 ms~56x
p99 transform4.2 ms190 ms~45x
Throughput2,800 ops/s45 ops/s~62x fewer
Entity types1518+person, company, location

The overhead comes entirely from the NER sidecar. If NER is configured but unavailable, each request incurs a 5-second timeout before falling back to builtin-only detection.

Horizontal Scaling

InstancesBuiltin ThroughputBuiltin + NER Throughput
12,800 ops/s45 ops/s
25,500 ops/s88 ops/s
411,000 ops/s170 ops/s
822,000 ops/s340 ops/s

Scaling is near-linear. Each instance is stateless (sealed session blobs travel with requests). NER scaling requires proportional NER sidecar instances.

Memory Footprint

ComponentMemory
Base server process18 MB
Per active request overhead~50 KB
Per entity in session blob~200 bytes
Session blob (100 entities)~20 KB
Session blob (1,000 entities)~200 KB
Regex pattern cache (15 builtin types)~2 MB
Peak under 100 concurrent requests~35 MB

The Rust runtime has no garbage collector pauses. Memory usage is stable under sustained load.

Methodology

  • Tool: wrk2 with constant-throughput mode, lua scripts for POST payloads.
  • Payloads: Synthetic text with realistic PII density (emails, phones, SSNs, IBANs).
  • Warm-up: 10-second warm-up discarded before measurement.
  • NER sidecar: Single GLiNER instance on same host, 1 CPU core, 1 GB RAM.
  • Repeated: Each benchmark run 3 times; median values reported.
  • No sampling: All requests measured, not sampled.

Numbers will vary with hardware, payload composition, and NER model size. Use these as baseline expectations for capacity planning. Run the included benchmark suite (cargo bench -p guardai-perf) against your own infrastructure for production estimates.