Reference and how-to

Self-Hosted PII Detection on Your Own Hardware

Run PII detection entirely inside your own boundary: serve the models with PhEye, embed Phileas, or run local ONNX inference, with no data egress.

The most important property of a PII redaction tool is often architectural, not algorithmic: where does the detection run? Cloud PII APIs require you to send your text to a third-party service to find the sensitive data in it. For regulated or sensitive data, that does not reduce exposure, it relocates it, and adds a vendor to your compliance boundary.

The Philterd toolkit is built the other way. The models run on your hardware, inside your own boundary, with no third-party API, no model-provider account, and no text leaving your infrastructure. This guide covers the three ways to do that and when to choose each. (For the deployment mechanics once you have chosen, the other operations guides cover CloudFormation, TLS, reverse proxies, and monitoring.)

Three ways to run detection in your boundary

All three run the same models and produce the same detections. They differ in where the model runs relative to your application.

PhEye as a self-hosted service

PhEye is the model server for the toolkit. You run it on your own infrastructure; it loads one or more models as lenses and exposes a simple HTTP endpoint. Philter , Phileas , or any HTTP client posts text and gets back entities with confidence scores. Nothing leaves your network.

curl http://localhost:5000/find \
    --data "Please forward the invoice to Maria Gonzalez."

[
  { "type": "PER", "text": "Maria Gonzalez", "confidence": 0.98 }
]

Use this when several applications need detection, or when you want to scale model serving independently and run it on dedicated hardware (for example a GPU host) separate from the applications that call it.

Phileas embedded as a library

Phileas is a library, available for Java, Python, and .NET. You embed it directly in your application, so detection and redaction happen in-process with no separate service. The same JSON policy that drives Philter drives the embedded library. This is the simplest deployment when one application owns the redaction and you want no extra moving parts.

Local ONNX inference, in-process

For the JVM, the phileas-pheye-onnx module lets Phileas run a GLiNER model itself, in-process, via ONNX Runtime, with no PhEye service at all. Add it as a dependency and point a PhEye filter at a local model directory:

{
  "phEyeConfiguration": {
    "modelPath": "/models/ph-eye-pii-en-small",
    "labels": ["person"]
  }
}

With no modelPath, Phileas calls a remote PhEye service as before; with one set, it loads the model and runs detection locally. This path is parity-verified against the Python reference, so it produces the same spans as the server would. See the phileas-pheye-onnx documentation for the model directory layout and details.

The models

The detection models are published on Hugging Face: ph-eye-pii-en-xsmall , -small , -medium , and -large . They are English person-name detectors and they run on CPU, so no GPU is required. The four sizes let you trade footprint and latency against a little accuracy: the xsmall is the lightest, a roughly 90 MB int8 graph suited to the edge and in-browser use; the large model is the highest capacity, for server-side workloads. For what they detect and how they were trained, see How We Built PhEye’s PII Name Models .

Whichever path you choose, the structured PII types (emails, phone numbers, SSNs, card numbers) are handled by the pattern-based detectors in Phileas and Philter , which also run in your boundary, so a full pipeline detects names with the models and the structured types with deterministic rules, all without any data egress.

Air-gapped and data-sovereignty deployments

Self-hosted detection is what makes the strictest deployments possible, and they cut across industries: regulated healthcare and financial institutions, defense and government, and critical infrastructure all run workloads where data cannot leave a controlled environment.

Fully air-gapped. When the environment has no internet egress at all, the toolkit still runs. The engine, the models, and the policies are artifacts you bring into the environment as container images or files. There is no call-home, no telemetry, and no license server to reach, so detection works with the network cable unplugged.
Data residency and sovereignty. Because nothing is sent to a third party, the data stays in the jurisdiction and the environment you run in: a specific region, AWS GovCloud, Azure Government, your own data center, or on-premises. The redaction step happens before any record crosses a boundary, which is the posture data-residency and sovereignty rules require.
Auditable, not a black box. The detection logic is open source , so a security team can read every path rather than take a vendor’s word for it, which matters when an auditor or oversight body asks how a redaction decision is made.

For the public-sector specifics (FOIA, ATO boundaries, NIST 800-53), see PII Redaction for Government . If you need help standing this up in a constrained environment, that is work we do .

Where this fits

Running detection on your own hardware is the architecture that makes the redaction techniques safe to run, because the sensitive text never has to cross a boundary to be processed. It is not itself a redaction technique; it is what makes the rest safe.

As always, detection is probabilistic and configurable. These models reduce how much sensitive data gets through; they do not catch every instance. Validate any model and policy against your own representative data, set thresholds for your tolerance, and treat redaction as one layer of a defense-in-depth strategy. You remain the data controller for the data you process.

Where to go next

PhEye to serve the models, and Phileas to embed them.
phileas-pheye-onnx for in-process local inference on the JVM.
Writing your first redaction policy to configure what gets detected and how it is redacted.
Modern privacy techniques for where self-hosted detection sits among the techniques.

Frequently asked questions

Does any text leave my infrastructure?

No. Every option here runs the detection models inside your own boundary. There is no third-party API call, no model-provider account, and no text sent anywhere. That is the point: cloud PII APIs require shipping your sensitive text to someone else's cloud to detect PII, which for regulated data simply moves the exposure rather than removing it.

Do I need a GPU?

No. The models run on CPU. The xsmall model is the lightest, a roughly 90 MB int8 graph for the edge and in-browser use; the small model is next, and larger sizes trade footprint and latency for a little more accuracy. A GPU speeds up high-throughput server-side inference but is not required.

Which option should I choose?

Run PhEye as a shared service if several applications need detection or you want to scale model serving on its own hardware. Embed Phileas as a library, or use local ONNX inference, when you want a single self-contained process with no service to call.