PII vs PHI vs NPPI: An Engineer's Guide
Three acronyms used interchangeably that shouldn't be. A reference for engineers and compliance leads, with the regulatory and architectural take on each.
Reference and how-to
Guides to PII and PHI redaction with the Philterd toolkit: fundamentals, writing policies, techniques, AI privacy, self-hosting, integrations, and compliance.
Three acronyms used interchangeably that shouldn't be. A reference for engineers and compliance leads, with the regulatory and architectural take on each.
Data redaction removes sensitive information from documents and datasets, but covers more techniques than most realize. A guide to strategies and trade-offs.
NPPI is Nonpublic Personal Information, the financial data GLBA protects. What counts, what's excluded, and how it differs from PII and PHI.
PII is the term everyone uses and few define the same way. A practitioner's guide to what counts as PII, how to find it in real data, and how to handle it.
PhiSQL is a readable query language that compiles to a Phileas redaction policy. An introduction to the syntax and how it maps to the policy JSON.
A how-to for redacting national and financial IDs (SIN, CPF, DNI, IBAN, BIC, and more) with checksum-validated patterns that reject look-alikes.
What a redaction policy is, how the JSON schema is structured, and how to use it to control exactly which PII is detected and how each type is redacted.
A hands-on walkthrough from empty file to working redaction policy: detect an entity, apply it with Philter, change how it redacts, and handle false positives.
A practical map of privacy techniques: redaction, pseudonymization, encryption, generalization, differential privacy, and synthetic data, and when each fits.
PDFs leak redacted text in unexpected ways: invisible text layers, embedded files, and metadata. Why PDF redaction is harder than it looks, with Philter's fix.
Comparing the two main approaches to redacting PII and PHI: an LLM versus pattern-based rules. How each handles accuracy, cost, and GDPR or HIPAA compliance.
Format-preserving encryption (FPE) encrypts a value so the ciphertext keeps its shape and won't break downstream systems. A guide with credit-card examples.
Embeddings look like just numbers, but research shows they are partially invertible. A practical defense guide for vector stores against PII recovery attacks.
Every prompt sent to an LLM is a data egress point. Six concrete patterns for structuring prompts, redacting inputs, and scanning outputs so PII doesn't leak.
Map Philter AI Proxy features to SOC 2 Trust Services Criteria and HIPAA Security Rule safeguards, with guidance for your own attestations.
Comparing the two main approaches to redacting PII and PHI: an LLM versus pattern-based rules. How each handles accuracy, cost, and GDPR or HIPAA compliance.
Run PII detection entirely inside your own boundary: serve the models with PhEye, embed Phileas, or run local ONNX inference, with no data egress.
How to use Philter to redact PII and PHI inside an Apache NiFi data flow, either through Philter's API or with an embedded NiFi processor.
How to redact PII from Java logs before they are written, with a Phileas-backed log4j2 rewrite policy and a logback converter, recursion guard included.
Redact PII column by column in Apache Spark and Databricks with a Phileas UDF: a verified PySpark example, plus the performance tradeoffs to plan for.
How to install the open source Phileas connector for Trino, configure a redaction policy, and mask PII column by column in SQL, with a runnable demo.
Amazon Kinesis Firehose is a managed streaming service that moves data from sources to destinations like S3 and Redshift. This post redacts PII in that stream.
How to call Philter from a Microsoft Power Automate (Flow) automation to redact PII and PHI from text, using a simple HTTP action.
How to deploy Philter in AWS with a CloudFormation template: finding the Philter AMI, editing the template, and launching the stack.
How to replace Philter's default self-signed SSL certificate with a signed certificate from a trusted authority, using a Java keystore.
How to run an Apache reverse proxy in front of Philter for SSL termination, access control, and access logging.
How to monitor a Philter deployment in AWS: CloudWatch Logs for application logs, load balancer health checks for availability, and CloudWatch Metrics.
How to configure a Valkey cache so Philter maintains referential integrity (consistent replacement values) across documents and contexts in a cluster.
How to configure a Philter deployment for HIPAA: encryption of data at rest and in motion across AWS, Azure, and Google Cloud.
How Philter addresses PIPEDA and Quebec's Law 25: self-hosted deployment for data residency, bilingual redaction, and Canadian identifier handling.
A how-to for redacting national and financial IDs (SIN, CPF, DNI, IBAN, BIC, and more) with checksum-validated patterns that reject look-alikes.
Map Philter AI Proxy features to SOC 2 Trust Services Criteria and HIPAA Security Rule safeguards, with guidance for your own attestations.
A self-hosted PII redaction vendor never touches your data, so there is no business-associate or processor relationship to govern. With definitions.