PII Lens

Spanish PII

Spanish-language PII detection for documents written in Spanish — Spain and Latin American name patterns, address formats, and national-ID identifiers (DNI, NIE, CIF).

Status available
License Apache-2.0
Version 1.0.0
Updated 2026-05-22
PhEye compatibility >=1.0.0
Languages es
Model size 190 MB
Author Philterd

Entities detected

PERSON
LOCATION
ORG
DNI
NIE
CIF
PHONE
ADDRESS

When to load this lens

Load this lens for Spanish-language text. Combine with the General Purpose lens when documents are bilingual (English headers, Spanish narrative — common in US healthcare and immigration contexts).

Pairs well with

General Purpose: Broad PII baseline for documents that don't fit a specific domain — customer-support tickets, internal correspondence, generic business records. The default lens loaded by PhEye when no other is specified.
Healthcare: Clinical-text lens trained for entities that matter in EHR exports, clinical notes, discharge summaries, and medical-chatbot transcripts — higher recall than general NER on the healthcare-specific surface.

What this lens detects

PII in Spanish-language text, including:

Person names — Spanish naming conventions (often two given names, two surnames). Pattern-matching that’s tuned for general English NER misses many Spanish names entirely.
Locations and organizations — Spanish address formats (Calle Mayor, 14, 3º A, 28013 Madrid), Latin American conventions (street numbers after street names), municipality and province names.
DNI — Spanish national identity document, format 12345678A.
NIE — Foreign-resident identifier in Spain, format X1234567A.
CIF — Spanish business tax ID, format A12345678.
Phone numbers — Spanish (+34), Mexican (+52), and other Spanish-speaking-country phone formats.
Addresses — Spanish-language address conventions.

When to use this

Healthcare encounters with Spanish-speaking patients in the U.S. — many EHR systems carry chart notes in Spanish even when the rest of the record is in English. Load alongside Healthcare.
Spanish-language contact-center transcripts — Spanish-speaking customer-service traffic in Latin American markets.
Documents from Spain, Mexico, Argentina, Colombia, Chile, Peru, Venezuela, and other Spanish-speaking jurisdictions.
Bilingual document workflows — combine with General Purpose so the lens stack handles both languages on the same documents.

Known limitations

Spelling variants in Latin American Spanish vs Iberian Spanish — coverage is broad but specific regional vocabulary in things like address tags may have lower recall.
Mixed-language sentences — code-switching (English / Spanish in the same sentence, common in U.S. bilingual contexts) is detected acceptably but specialized prompts to the NLP model give better recall.
Diacritics matter. Documents that have been stripped of accent marks (some legacy EHRs do this) reduce recall on named entities — the lens recognizes both forms but it’s calibrated against well-formed text.

Use this lens with PhEye, Phileas, or Philter

PhEye loads this lens at configuration time and exposes it to Phileas and Philter automatically. Have questions about a specific deployment? Talk to the team.

About PhEye →