Specialized name-detection models

PII Lenses

Q: Is a lens the same as a policy?

No. A lens decides *what* is detected (here, people's names, and where in the text). A policy decides *what to do* with each detection (mask, redact, encrypt, abbreviate, pass through). You need both. Lenses are model artifacts loaded by PhEye; policies are JSON files loaded by Phileas or Philter. They compose at runtime.

A lens is a specialized AI / NLP model that finds people's names in text. Philterd publishes English person-name detection in three sizes (small, medium, and large) so you can match accuracy to your latency and footprint budget, and curated community lenses extend coverage to other languages and clinical text. Structured identifiers (account numbers, national IDs, phone numbers, addresses) are detected separately by Phileas's pattern-based layer. Lenses snap into PhEye (and through it into Phileas and Philter) at configuration time.

See available lenses → About PhEye →

Why lenses, not one model

A single general-purpose LLM is an expensive generalist: gigabytes of weights, slow inference, and accuracy that sags on text it was barely trained on. A lens is the opposite, a small, fast model focused on finding names in the size, language, or domain your workload actually needs. That focus buys three things over one big model:

Right-sized and specialized

Pick the model that fits the job. Philterd's English name detector ships in three sizes, so you trade accuracy against latency and footprint instead of running one heavyweight model everywhere. Community lenses add focus where it matters most: French name forms, or clinical text dense with medical vocabulary that makes a general model miss or over-flag. Specialized, right-sized models are where name recall and precision either pass or fail an audit.

Lower compute cost

Each lens is purpose-built and small: tens to hundreds of MB, not the gigabytes of a general LLM. CPU-friendly inference via ONNX. The cost difference at scale is one or two orders of magnitude versus calling a hosted LLM for the same task.

Composable at runtime

Run more than one lens at a time and union the detections. An English-name lens, a French-name lens, and your own custom lens on the same document, in one call, each finding the names it is best at. Structured identifiers are added by Phileas's pattern layer. The policy engine merges everything and applies filter strategies.

Available lenses

Every lens is a self-hosted model PhEye loads to find people's names. The Source badge tells you who built it: lenses trained by Philterd, and curated community models from the open source ecosystem. Use our models, community models, or your own, and combine multiple lenses on the same workload. PhEye is model-agnostic and runs entirely inside your boundary. The lens catalog below is tracked by the open source pheye-pii-lenses repository.

Lens	Source	Language(s)	Entity Type(s)	License	Model
English Names (Extra Small)	Philterd	English	PERSON	CC-BY-4.0	philterd/ph-eye-pii-en-xsmall
English Names (Large)	Philterd	English	PERSON	CC-BY-4.0	philterd/ph-eye-pii-en-large
English Names (Medium)	Philterd	English	PERSON	CC-BY-4.0	philterd/ph-eye-pii-en-medium
English Names (Small)	Philterd	English	PERSON	CC-BY-4.0	philterd/ph-eye-pii-en-small
Hospitals	Community	English	Hospital names, room numbers, clinical providers	-	knowledgator/gliner-pii-base-v1.0
Medical Conditions	Community	English	Disease, disorder	-	blaze999/Medical-NER
French Persons	Community	French	Person	-	EmergentMethods/gliner_medium_news-v2.1
French Medical	Community	French	Disease (Maladie)	-	almanach/camembert-bio-gliner-v0.1

Combining lenses

Lenses are designed to be loaded together. A support team processing English and French tickets loads the English Names lens and a French community lens; PhEye serves the union of their name detections on every request, and Phileas’s pattern layer adds the structured identifiers (account numbers, dates, phone numbers). The policy engine then decides which detections to act on and how, based on the configured filter strategies.

Lens loading happens at PhEye configuration time, not per-request. Switching lenses on a workload is a config change and a model reload, not a code change in the calling application.

Need a lens that doesn't exist yet?

The published lenses cover the workloads we see most often. When a customer has a domain or language we haven’t trained for (insurance claims narratives, contact-center transcripts, a specific regional language or naming convention) we can train a custom name lens.

The shape of that work:

You provide a representative annotated dataset (or we help build one from a sample of your real documents under appropriate agreements).
We train and evaluate the lens against a held-out test set; you see precision / recall numbers for the name entity on your own data.
Deployment is the same as any other lens: load it into PhEye, it’s available to Phileas and Philter through the standard configuration.
The trained lens stays yours in the appropriate sense: you can deploy it however you need, including in environments PhEye is not running in.

Custom lens engagements are part of our consulting and support. Talk to us with a sample of the data and a description of what generic models miss; we’ll tell you whether a custom lens is the right answer.

FAQ

Is a lens the same as a policy?

No. A lens decides *what* is detected (here, people's names, and where in the text). A policy decides *what to do* with each detection (mask, redact, encrypt, abbreviate, pass through). You need both. Lenses are model artifacts loaded by PhEye; policies are JSON files loaded by Phileas or Philter. They compose at runtime.

Can a lens replace a regex?

For names, the lens is the better tool: a trained model finds people's names in context that regex can't reliably match, like "Dr. Smith" or a name buried in a clinical narrative. For structured identifiers (SSNs, credit-card and account numbers, national IDs, phone numbers), a validated regex is faster and more reliable than a model, and that is exactly what Phileas's pattern layer does. Production deployments use both: lenses for names, regex and validators for the structured surface.

Does running multiple lenses slow things down?

Slightly. Each loaded lens adds inference time per document. Inference is parallelized inside PhEye where possible. For typical lens combinations (2-4 lenses) on CPU, the latency overhead is in the tens-of-milliseconds range per document, not a meaningful production cost for most workloads. Run Philter Scope against your real document distribution if precise numbers matter.

What does each lens detect?

Every Philterd lens detects person names (PERSON). The Philterd lenses are the same English name detector in three sizes (small, medium, and large), so you trade accuracy against latency and footprint; community lenses add other languages and domains. Structured identifiers are detected by Phileas's pattern-based layer, not the lenses. Each lens's detail page lists its language, license, and version; community models may cover other entity types, shown in the table above.

Are the lenses open source?

Licensing varies per lens. Open each lens's detail page in the catalog above: the license and any usage notes are listed there alongside the language and version. Custom-trained lenses are deliverables of a paid engagement and are owned per the engagement terms.

Pick the lenses that fit your data

30 minutes with the team to walk through the lenses your workload needs, what to combine, and whether a custom lens makes sense. Bring a sanitized sample of your real documents if you have one.

About PhEye →