The most important property of a PII redaction tool is often architectural, not algorithmic: where does the detection run? Cloud PII APIs require you to send your text to a third-party service to find the sensitive data in it. For regulated or sensitive data, that does not reduce exposure, it relocates it, and adds a vendor to your compliance boundary.
The Philterd toolkit is built the other way. The models run on your hardware, inside your own boundary, with no third-party API, no model-provider account, and no text leaving your infrastructure. This guide covers the three ways to do that and when to choose each. (For the deployment mechanics once you have chosen, the other operations guides cover CloudFormation, TLS, reverse proxies, and monitoring.)
Three ways to run detection in your boundary
All three run the same models and produce the same detections. They differ in where the model runs relative to your application.
PhEye as a self-hosted service
PhEye is the model server for the toolkit. You run it on your own infrastructure; it loads one or more models as lenses and exposes a simple HTTP endpoint. Philter, Phileas, or any HTTP client posts text and gets back entities with confidence scores. Nothing leaves your network.
curl http://localhost:5000/find \
--data "Please forward the invoice to Maria Gonzalez."
[
{ "type": "PER", "text": "Maria Gonzalez", "confidence": 0.98 }
]
Use this when several applications need detection, or when you want to scale model serving independently and run it on dedicated hardware (for example a GPU host) separate from the applications that call it.
Phileas embedded as a library
Phileas is a library, available for Java, Python, and .NET. You embed it directly in your application, so detection and redaction happen in-process with no separate service. The same JSON policy that drives Philter drives the embedded library. This is the simplest deployment when one application owns the redaction and you want no extra moving parts.
Local ONNX inference, in-process
For the JVM, the phileas-pheye-onnx module lets Phileas run a GLiNER model itself, in-process, via ONNX Runtime, with no PhEye service at all. Add it as a dependency and point a PhEye filter at a local model directory:
{
"phEyeConfiguration": {
"modelPath": "/models/ph-eye-pii-en-small",
"labels": ["person"]
}
}
With no modelPath, Phileas calls a remote PhEye service as before; with one set, it loads the model and runs detection locally. This path is parity-verified against the Python reference, so it produces the same spans as the server would. See the phileas-pheye-onnx documentation for the model directory layout and details.
The models
The detection models are published on Hugging Face: ph-eye-pii-en-xsmall, -small, -medium, and -large. They are English person-name detectors and they run on CPU, so no GPU is required. The four sizes let you trade footprint and latency against a little accuracy: the xsmall is the lightest, a roughly 90 MB int8 graph suited to the edge and in-browser use; the large model is the highest capacity, for server-side workloads. For what they detect and how they were trained, see How We Built PhEye’s PII Name Models.
Whichever path you choose, the structured PII types (emails, phone numbers, SSNs, card numbers) are handled by the pattern-based detectors in Phileas and Philter, which also run in your boundary, so a full pipeline detects names with the models and the structured types with deterministic rules, all without any data egress.
Air-gapped and data-sovereignty deployments
Self-hosted detection is what makes the strictest deployments possible, and they cut across industries: regulated healthcare and financial institutions, defense and government, and critical infrastructure all run workloads where data cannot leave a controlled environment.
- Fully air-gapped. When the environment has no internet egress at all, the toolkit still runs. The engine, the models, and the policies are artifacts you bring into the environment as container images or files. There is no call-home, no telemetry, and no license server to reach, so detection works with the network cable unplugged.
- Data residency and sovereignty. Because nothing is sent to a third party, the data stays in the jurisdiction and the environment you run in: a specific region, AWS GovCloud, Azure Government, your own data center, or on-premises. The redaction step happens before any record crosses a boundary, which is the posture data-residency and sovereignty rules require.
- Auditable, not a black box. The detection logic is open source, so a security team can read every path rather than take a vendor’s word for it, which matters when an auditor or oversight body asks how a redaction decision is made.
For the public-sector specifics (FOIA, ATO boundaries, NIST 800-53), see PII Redaction for Government. If you need help standing this up in a constrained environment, that is work we do.
Where this fits
Running detection on your own hardware is the architecture that makes the redaction techniques safe to run, because the sensitive text never has to cross a boundary to be processed. It is not itself a redaction technique; it is what makes the rest safe.
As always, detection is probabilistic and configurable. These models reduce how much sensitive data gets through; they do not catch every instance. Validate any model and policy against your own representative data, set thresholds for your tolerance, and treat redaction as one layer of a defense-in-depth strategy. You remain the data controller for the data you process.
Where to go next
- PhEye to serve the models, and Phileas to embed them.
- phileas-pheye-onnx for in-process local inference on the JVM.
- Writing your first redaction policy to configure what gets detected and how it is redacted.
- Modern privacy techniques for where self-hosted detection sits among the techniques.