Talk to an Expert

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

Specialized PII detection models

PII Lenses

A lens is a specialized AI / NLP model trained to find a specific kind of PII or PHI — clinical entities, hospital names, room numbers, document identifiers, foreign-language names, and other categories where a generic model misses too much. Lenses snap into PhEye (and through it into Phileas and Philter) at configuration time — no code changes, no separate service.

Why lenses, not one model

A single general-purpose PII model has to be a generalist. It hits the obvious entities (names, SSNs, phone numbers, emails) at acceptable precision but loses recall on the domain-specific identifiers that matter most to the buyer. Lenses fix this by training a small, fast model on a focused domain — with three concrete benefits over one big model:

Higher recall on domain entities

A healthcare lens trained on clinical text catches hospital names, hospital room numbers, medication mentions, dose units, and clinical abbreviations that generic NER models silently miss. Recall on the long-tail entities is where a workload either passes or fails an audit.

Lower compute cost

Each lens is purpose-built and small — tens to hundreds of MB, not the gigabytes of a general LLM. CPU-friendly inference via ONNX. The cost difference at scale is one or two orders of magnitude versus calling a hosted LLM for the same task.

Composable at runtime

Run more than one lens at a time and union the detections. Healthcare + multilingual + your own custom-trained lens on the same document, in one call. Each lens contributes its specialty; the policy engine merges and applies filter strategies.

Available lenses

Lenses are versioned and shipped with PhEye. Use any of them out of the box; combine multiple lenses on the same workload; or work with us to train one for your data. The catalog below is synced from the open source pheye-pii-lenses repository — each card links to a full detail page.

COVID-19

available

TEST_RESULT · VACCINE_BATCH · VARIANT · PANDEMIC_CLINICAL_TERM

Pandemic-era documents have a vocabulary that pre-2020 healthcare models don't fully cover. Use this lens alongside Healthcare for clinical text from 2020-onward.

License Apache-2.0

Read the details →

PAN · ROUTING_NUMBER · ACCOUNT_NUMBER · SWIFT · IBAN · BROKERAGE_ACCOUNT

Financial-account-aware detection that validates structure before redacting — Luhn for cards, country-coded checksum for IBAN. Used in conjunction with the General Purpose or contact-center workloads.

License Apache-2.0

Read the details →

French PII

available

PERSON · LOCATION · ORG · INSEE · SIREN · SIRET · PHONE · ADDRESS

French-language PII detection for documents from France, Belgium, Quebec, and other Francophone jurisdictions — including INSEE, SIREN, and SIRET identifiers.

License Apache-2.0

Read the details →

PERSON · LOCATION · ORG · DATE · PHONE · EMAIL · URL · SSN

Broad PII baseline for documents that don't fit a specific domain — customer-support tickets, internal correspondence, generic business records. The default lens loaded by PhEye when no other is specified.

License Apache-2.0

Read the details →

German PII

available

PERSON · LOCATION · ORG · STEUER_ID · PERSONALAUSWEIS · PHONE · ADDRESS

German-language PII detection for documents from Germany, Austria, and Switzerland — including Steuer-ID and Personalausweis identifiers.

License Apache-2.0

Read the details →

Healthcare

available

PERSON · PROVIDER · HOSPITAL · MEDICATION · DOSE_UNIT · SYMPTOM · CLINICAL_ABBREVIATION · MRN · DATE

Clinical-text lens trained for entities that matter in EHR exports, clinical notes, discharge summaries, and medical-chatbot transcripts — higher recall than general NER on the healthcare-specific surface.

License Apache-2.0

Read the details →

HOSPITAL · CLINIC · DEPARTMENT · ROOM_NUMBER · BED_IDENTIFIER · WARD

Narrower healthcare-adjacent lens for environments where hospital and room identifiers are the binding constraint — bed-management systems, patient-flow analytics, discharge planning tools.

License Apache-2.0

Read the details →

CASE_NUMBER · DOCKET_NUMBER · BAR_NUMBER · BATES_NUMBER · PARTY_ROLE · COURT

Specialized for legal-document workflows where case-management identifiers and Bates-style productions need to be recognized alongside the standard PII set.

License Apache-2.0

Read the details →

PERSON · LOCATION · ORG · CPF · CNPJ · PHONE · ADDRESS

Portuguese-language PII detection covering Brazilian and Portuguese conventions, including CPF and CNPJ tax identifiers.

License Apache-2.0

Read the details →

Spanish PII

available

PERSON · LOCATION · ORG · DNI · NIE · CIF · PHONE · ADDRESS

Spanish-language PII detection for documents written in Spanish — Spain and Latin American name patterns, address formats, and national-ID identifiers (DNI, NIE, CIF).

License Apache-2.0

Read the details →

Combining lenses

Lenses are designed to be loaded together. A clinical-research team running on EHR data with Spanish-speaking patient narratives loads the Healthcare + Hospital Identifiers + Spanish PII lenses; PhEye serves the union of their detections on every request. The policy engine then decides which detections to act on and how, based on the configured filter strategies.

Lens loading happens at PhEye configuration time, not per-request. Switching lenses on a workload is a config change and a model reload — not a code change in the calling application.

Need a lens that doesn't exist yet?

The published lenses cover the workloads we see most often. When a customer has a domain we haven’t trained for — insurance claims terminology, contact-center spoken-language patterns, agricultural records, manufacturing field-service notes, a specific country’s national-ID formats — we can train a custom lens.

The shape of that work:

  • You provide a representative annotated dataset (or we help build one from a sample of your real documents under appropriate agreements).
  • We train and evaluate the lens against a held-out test set; you see precision / recall numbers per entity type.
  • Deployment is the same as any other lens — load it into PhEye, it’s available to Phileas and Philter through the standard configuration.
  • The trained lens stays yours in the appropriate sense: you can deploy it however you need, including in environments PhEye is not running in.

Custom lens engagements are part of the Engaged tier. Talk to us with a sample of the data and a description of what generic models miss; we’ll tell you whether a custom lens is the right answer.

FAQ

Is a lens the same as a policy?
No. A lens decides *what* is detected (which entity types and where in the text). A policy decides *what to do* with each detection (mask, redact, encrypt, abbreviate, pass through). You need both. Lenses are model artifacts loaded by PhEye; policies are JSON files loaded by Phileas or Philter. They compose at runtime.
Can a lens replace a regex?
For some entities, yes. A trained model finds names and contextual mentions that regex can't — "Dr. Smith," "Room 412B," "three weeks postpartum." For other entities (SSNs, credit-card numbers, structured account numbers), a validated regex is faster and more reliable than a model and you should keep using it. Most production deployments use both: lenses for the unstructured surface, regex + validators for the structured surface.
Does running multiple lenses slow things down?
Slightly. Each loaded lens adds inference time per document. Inference is parallelized inside PhEye where possible. For typical lens combinations (2-4 lenses) on CPU, the latency overhead is in the tens-of-milliseconds range per document — not a meaningful production cost for most workloads. Run Philter Scope against your real document distribution if precise numbers matter.
Where can I see the entity types each lens detects?
Per-lens entity reference is in the PhEye documentation. The catalog cards above are summary-level; each lens's detail page has the full list.
Are the lenses open source?
Licensing varies per lens. Open each lens's detail page in the catalog above — the license and any usage notes are listed there alongside the entity coverage and version. Custom-trained lenses are deliverables of a paid engagement and are owned per the engagement terms.

Pick the lenses that fit your data

30 minutes with the team to walk through the lenses your workload needs, what to combine, and whether a custom lens makes sense. Bring a sanitized sample of your real documents if you have one.

About PhEye →