PII Lens

Portuguese PII

Portuguese-language PII detection covering Brazilian and Portuguese conventions, including CPF and CNPJ tax identifiers.

Status available
License Apache-2.0
Version 1.0.0
Updated 2026-05-22
PhEye compatibility >=1.0.0
Languages pt
Model size 190 MB
Author Philterd

Entities detected

PERSON
LOCATION
ORG
CPF
CNPJ
PHONE
ADDRESS

When to load this lens

Load this lens for Portuguese-language text. Brazilian fintech, healthcare, and contact-center workloads commonly pair it with General Purpose.

Pairs well with

General Purpose: Broad PII baseline for documents that don't fit a specific domain — customer-support tickets, internal correspondence, generic business records. The default lens loaded by PhEye when no other is specified.
Healthcare: Clinical-text lens trained for entities that matter in EHR exports, clinical notes, discharge summaries, and medical-chatbot transcripts — higher recall than general NER on the healthcare-specific surface.

What this lens detects

PII in Portuguese-language text, including:

Person names — Brazilian and Portuguese naming conventions (often multiple surnames including a mother’s family name).
Locations and organizations — Brazilian and Portuguese address formats and city / state / district names.
CPF — Brazilian individual taxpayer identifier, format 123.456.789-09 (11 digits with checksum).
CNPJ — Brazilian business identifier, format 12.345.678/0001-99 (14 digits with checksum).
Phone numbers — Brazilian (+55), Portuguese (+351), and other Lusophone phone formats.
Addresses — Portuguese-language address conventions including CEP (Brazilian postal code) patterns.

When to use this

Documents from Brazil, Portugal, Angola, Mozambique, and other Lusophone jurisdictions.
Brazilian fintech — CPF / CNPJ are the binding identifiers in most financial workflows; recognizing them with checksum validation reduces both false positives and missed detections.
LGPD-driven workflows. Brazil’s data-protection law mirrors many GDPR principles; the lens supports the self-hosted detection posture that compliance requires.
Bilingual / multilingual environments — combine with General Purpose for English / Portuguese mixed documents.

Known limitations

Brazilian Portuguese vs European Portuguese. The two variants share most vocabulary but diverge on some entity-typical phrasing. The lens is trained primarily on Brazilian Portuguese; European Portuguese gets functional but slightly reduced recall.
African Portuguese variants — coverage is functional but not specifically calibrated for African Portuguese institutional vocabulary.
CPF / CNPJ checksum validation is performed by the lens at recognition time; well-formed-but-fake numbers (e.g., test fixtures all zeros) are correctly recognized as PII even when they fail checksum, but the policy layer is the right place to filter test-data patterns if that matters.

Use this lens with PhEye, Phileas, or Philter

PhEye loads this lens at configuration time and exposes it to Phileas and Philter automatically. Have questions about a specific deployment? Talk to the team.

About PhEye →