PII Lens

French PII

French-language PII detection for documents from France, Belgium, Quebec, and other Francophone jurisdictions — including INSEE, SIREN, and SIRET identifiers.

Status available
License Apache-2.0
Version 1.0.0
Updated 2026-05-22
PhEye compatibility >=1.0.0
Languages fr
Model size 190 MB
Author Philterd

Entities detected

PERSON
LOCATION
ORG
INSEE
SIREN
SIRET
PHONE
ADDRESS

When to load this lens

Load this lens for French-language text. Common combinations: General Purpose for bilingual documents, Healthcare for clinical-French notes.

Pairs well with

General Purpose: Broad PII baseline for documents that don't fit a specific domain — customer-support tickets, internal correspondence, generic business records. The default lens loaded by PhEye when no other is specified.
Healthcare: Clinical-text lens trained for entities that matter in EHR exports, clinical notes, discharge summaries, and medical-chatbot transcripts — higher recall than general NER on the healthcare-specific surface.

What this lens detects

PII in French-language text, including:

Person names — French naming conventions (single given name common, particle-prefixed surnames such as de la Tour).
Locations and organizations — French postal address format (14 rue de la Paix, 75002 Paris), Belgian and Quebec address conventions.
INSEE numbers — French national identifier (numéro de Sécurité sociale), 13-digit format with checksum.
SIREN — 9-digit business identifier.
SIRET — 14-digit establishment identifier (SIREN + 5-digit NIC).
Phone numbers — French (+33), Belgian (+32), Swiss (+41), Canadian (+1) formats.
Addresses — French-language address conventions.

When to use this

Documents from France, Belgium, Switzerland (French regions), Quebec, Luxembourg, and other Francophone jurisdictions.
Bilingual environments — combine with General Purpose for English / French mixed documents (common in Canadian government, EU institutional, multinational HR records).
GDPR-driven workflows. French data-protection enforcement (CNIL) is among the most active in the EU; data residency and minimization are first-order constraints. Self-hosted detection with this lens addresses both.

Known limitations

Quebec French vs Iberian French — coverage is broad on shared vocabulary; specific Quebec institutional vocabulary (e.g., RAMQ for the public health-insurance ID) has partial coverage. RAMQ specifically is recognized but the lens labels it as a generic IDENTIFIER rather than a typed entity.
African Francophone variants — trained primarily on European and Quebec corpora; coverage on Maghreb / West African French text is functional but less calibrated.
Diacritics. Accent marks matter for many French entities; documents that have been stripped of diacritics get reduced recall.

Use this lens with PhEye, Phileas, or Philter

PhEye loads this lens at configuration time and exposes it to Phileas and Philter automatically. Have questions about a specific deployment? Talk to the team.

About PhEye →