Architecture
Detector Capabilities
Entity detection capabilities by runtime mode, language support matrix, performance characteristics, and deployment modes
| Entity Type | Builtin (Regex) | + GLiNER | + spaCy |
|---|
| Email | High | High | High |
| Phone | High (intl + DE) | High | High |
| SSN | High | High | High |
| IBAN | High | High | High |
| Credit Card | High (Luhn) | High | High |
| IP Address | High (v4 + v6) | High | High |
| URL | High | High | High |
| Customer ID | High (pattern) | High | High |
| Order Number | High (pattern) | High | High |
| Passport | Medium (pattern) | High | Medium |
| Health ID | Medium (pattern) | High | Medium |
| Date of Birth | High (with context) | High | High |
| German Tax ID | High | High | High |
| German Social Security | High | High | High |
| Person Name | None | Good | Good |
| Company Name | None | Good | Moderate |
| Location | None | Good | Good |
| Address (street) | Medium (DE/US) | Better | Better |
| Language | Builtin | + GLiNER | + spaCy |
|---|
| English | Full | Full | Full |
| German | Full | Full | Full |
| French | Partial | Full | Full (with model) |
| Spanish | Partial | Full | Full (with model) |
| Italian | Partial | Good | Good |
| Arabic | Basic | Good | Limited |
| Japanese | Basic | Good | Limited |
| Chinese | Basic | Good | Limited |
| Korean | Basic | Good | Limited |
| Russian | Basic | Good | Good |
| Other (20+) | Basic | Good | Varies |
| Metric | Builtin Only | + GLiNER | + spaCy |
|---|
| Cold start | <1ms | 5-30s (model load) | 3-15s (model load) |
| Idle memory | ~10MB | ~500MB (CPU) / ~2GB (GPU) | ~200MB |
| Latency p50 | <1ms | 10-50ms | 5-20ms |
| Deterministic | Yes | Mostly | Mostly |
| GPU required | No | Optional (faster) | No |
| Mode | Config | What Runs | Best For |
|---|
builtin | detector.mode: builtin | Rust regex only | Low-latency, structured data |
advanced | detector.mode: advanced + detector.advanced_url | Python NER only | Maximum accuracy |
both | detector.mode: both + URL | Rust regex + Python NER merged | Best coverage |
When the Python NER sidecar is unavailable:
- Server logs:
ner_service_unavailable_falling_back_to_builtin
- Detection continues with builtin regex only
- Health endpoint reports:
detector_status: "degraded" (if configured as both) or "builtin_only"
- No crash, no hang, no data loss
- Entity types requiring NER (Person, Company, Location) will not be detected