PII vs PHI vs NPPI: An Engineer's Guide
Few three-letter combinations cause more confusion in data privacy than PII, PHI, and NPPI. They overlap, they get used interchangeably, and the regulatory implications of mixing them up are real.
This is the short, definitional reference. One paragraph each, the regulatory framework that defines it, and the architectural implication for the engineer who has to do something about it.
PII: the umbrella
Personally Identifiable Information is any data that can be used — on its own or in combination with other data — to identify a specific person. Names, email addresses, SSNs, IP addresses, and biometric data are PII. So are quasi-identifiers like zip code + date of birth + gender, which together are sufficient to uniquely identify ~87% of the U.S. population.
Defined by: NIST SP 800-122 in the U.S.; GDPR Article 4(1) in the EU (where it's called "personal data" and has a broader interpretation); state privacy laws (CCPA, VCDPA, CPA, TDPSA).
Engineering implication: PII is the broadest category and the default assumption. Any data-handling pipeline should assume PII is present unless proven otherwise. Use discovery tools like Phinder to find out where it lives, redact it before it propagates with Philter, and measure performance with Philter Scope. A longer practitioner's guide to PII is here.
PHI: PII plus health context
Protected Health Information is the subset of PII that's tied to a person's health status, healthcare, or healthcare payment. A SSN by itself is PII; a SSN attached to a diagnosis or a prescription is PHI — and the regulatory exposure jumps substantially.
Defined by: HIPAA, specifically 45 CFR § 160.103. The Safe Harbor de-identification standard (45 CFR § 164.514(b)(2)) enumerates 18 specific identifier categories that must be removed for data to no longer count as PHI.
Engineering implication: PHI workloads have stricter handling requirements than general PII workloads. Encryption at rest and in transit, Business Associate Addenda with any vendor that touches the data, audit logging of every access, and a documented de-identification process before data can be used for research or secondary purposes. The Philterd toolkit covers the 18 Safe Harbor identifiers with built-in detectors and custom identifier patterns — we mapped each one in the HIPAA blueprint.
NPPI: PII plus financial context
Nonpublic Personal Information is the subset of PII tied to a person's financial products or services. Account numbers, transaction history, financial statements, tax records, credit decisions, mortgage applications — anything obtained in the course of providing a financial product is NPPI.
Defined by: The Gramm-Leach-Bliley Act (GLBA), specifically the Privacy Rule (16 CFR Part 313) and the Safeguards Rule (16 CFR Part 314, updated 2023). Plus PCI DSS for cardholder data specifically.
Engineering implication: NPPI requires the same architectural pattern as PHI — redact at the boundary, restrict raw access to compliance-only systems, audit downstream propagation — but with a different set of identifiers (PANs, account numbers, IBANs, routing codes) and a different overlay of regulations (PCI DSS scope reduction, GLBA Safeguards Rule controls, BSA/AML retention carve-outs). The financial-services pattern walks through it end-to-end.
The overlap, visualized
┌─────────────────────────────────────┐
│ PII │
│ (the broad umbrella; NIST, │
│ GDPR, state laws) │
│ │
│ ┌──────────────┐ ┌──────────┐ │
│ │ PHI │ │ NPPI │ │
│ │ (HIPAA) │ │ (GLBA, │ │
│ │ │ │ PCI) │ │
│ └──────────────┘ └──────────┘ │
│ │
└─────────────────────────────────────┘The two inner sets don't overlap much in practice (a patient's medical record number isn't a banking account number), but the outer umbrella catches both. A general-purpose PII redaction pipeline is the foundation; PHI and NPPI workloads layer on additional regulatory controls.
The practical decision tree
When you're staring at a new data pipeline and wondering which regulatory regime applies, three questions usually resolve it:
- Does this data relate to a person's health, healthcare, or healthcare payment? → PHI. HIPAA applies. Use a Safe Harbor-aligned policy.
- Did this data come to you because you're providing a financial product or service? → NPPI. GLBA applies (plus PCI DSS if it's cardholder data). Use the financial-services policy pattern.
- Otherwise? → General PII. Whichever state laws (CCPA, etc.) and contractual obligations apply to you set the floor.
The same Philter engine handles all three with different policy files. The difference between a HIPAA-compliant pipeline and a GLBA-compliant pipeline isn't a different deployment — it's a different JSON file (and a different audit story) using the same building blocks.
Why the distinction matters
Three reasons engineers should care about getting the term right:
- Different controls. HIPAA requires BAAs with vendors; GLBA requires a written information security program with specified technical controls; CCPA requires consumer-facing opt-out mechanisms. A "we redact PII" pipeline doesn't automatically satisfy any of these.
- Different breach notification. HIPAA's breach notification rule kicks in at 500 records and requires HHS notification within 60 days. State NPPI breach notification timelines vary by jurisdiction. Knowing which one applies determines how fast your team has to move when something goes wrong.
- Different audit deliverables. An OCR (HHS) audit asks for different artifacts than an OCC (banking) audit. The detection logs, policy files, and discovery reports overlap structurally — the specific entity types and retention rules differ.
The toolkit, three policy files
The architectural punchline: you don't need three different redaction systems for the three categories. You need one engine (Philter) with three policy files (general PII, HIPAA Safe Harbor, GLBA/PCI), each tuned to the entity types and replacement strategies that match the regulatory framework. The detection happens in one place; the policy describes the framework.
If you need help building those three policies for your specific data — or aren't sure which framework applies to a workload you're scoping — get in touch. Most of our consulting engagements start with exactly this question, and we've yet to see a situation where the answer is "all three at once with the same configuration."