Talk to an Expert

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

← All lenses

PII Lens

French PII

French-language PII detection for documents from France, Belgium, Quebec, and other Francophone jurisdictions — including INSEE, SIREN, and SIRET identifiers.

  • Status available
  • License Apache-2.0
  • Version 1.0.0
  • Updated 2026-05-22
  • PhEye compatibility >=1.0.0
  • Languages fr
  • Model size 190 MB
  • Author Philterd

Entities detected

  • PERSON
  • LOCATION
  • ORG
  • INSEE
  • SIREN
  • SIRET
  • PHONE
  • ADDRESS

When to load this lens

Load this lens for French-language text. Common combinations: General Purpose for bilingual documents, Healthcare for clinical-French notes.

Pairs well with

  • General Purpose — Broad PII baseline for documents that don't fit a specific domain — customer-support tickets, internal correspondence, generic business records. The default lens loaded by PhEye when no other is specified.
  • Healthcare — Clinical-text lens trained for entities that matter in EHR exports, clinical notes, discharge summaries, and medical-chatbot transcripts — higher recall than general NER on the healthcare-specific surface.

What this lens detects

PII in French-language text, including:

  • Person names — French naming conventions (single given name common, particle-prefixed surnames such as de la Tour).
  • Locations and organizations — French postal address format (14 rue de la Paix, 75002 Paris), Belgian and Quebec address conventions.
  • INSEE numbers — French national identifier (numéro de Sécurité sociale), 13-digit format with checksum.
  • SIREN — 9-digit business identifier.
  • SIRET — 14-digit establishment identifier (SIREN + 5-digit NIC).
  • Phone numbers — French (+33), Belgian (+32), Swiss (+41), Canadian (+1) formats.
  • Addresses — French-language address conventions.

When to use this

  • Documents from France, Belgium, Switzerland (French regions), Quebec, Luxembourg, and other Francophone jurisdictions.
  • Bilingual environments — combine with General Purpose for English / French mixed documents (common in Canadian government, EU institutional, multinational HR records).
  • GDPR-driven workflows. French data-protection enforcement (CNIL) is among the most active in the EU; data residency and minimization are first-order constraints. Self-hosted detection with this lens addresses both.

Known limitations

  • Quebec French vs Iberian French — coverage is broad on shared vocabulary; specific Quebec institutional vocabulary (e.g., RAMQ for the public health-insurance ID) has partial coverage. RAMQ specifically is recognized but the lens labels it as a generic IDENTIFIER rather than a typed entity.
  • African Francophone variants — trained primarily on European and Quebec corpora; coverage on Maghreb / West African French text is functional but less calibrated.
  • Diacritics. Accent marks matter for many French entities; documents that have been stripped of diacritics get reduced recall.

Use this lens with PhEye, Phileas, or Philter

PhEye loads this lens at configuration time and exposes it to Phileas and Philter automatically. Have questions about a specific deployment? Talk to the team.

About PhEye →