PII & PHI Redaction Guides and Best Practices
Practical guides on PII and PHI redaction: self-hosted pipelines, LLM and RAG privacy, HIPAA and GDPR, and de-identification, from the Philter maintainers.
The Phileas Trino Connector Is Now Open Source
The Phileas Trino connector is now open source under Apache 2.0. How to install and register the plugin, a worked SQL redaction example, and ops details.
How to Tell if a PII Redaction Model Is Any Good
A model count says nothing about quality. What proves a PII redaction model works: held-out evaluation, precision and recall, and an auditable model card.
Precision, Recall, or F1: Which Redaction Metric Matters Most in Your Industry
The right redaction metric depends on your industry. See how healthcare, legal, finance, marketing, and research each prioritize precision, recall, or F1.
Redact PII From Your AI Agent in One Block of Config
Philter MCP exposes PII and PHI redaction as Model Context Protocol tools that Claude Desktop, Claude Code, Cursor, and other MCP clients call mid-conversation.
How to Deploy Philter in 5 Minutes on the AWS Marketplace
Subscribe, launch into your own VPC, and redact PII in minutes. A step-by-step guide to deploying the self-hosted Philter API from the AWS Marketplace.
How We Built PhEye's PII Name Models
Four open GLiNER name detectors for PhEye, from a 90 MB xsmall to a high-capacity large: why we built name-only models, how we trained them, and how they fit.
How We Use Philter Scope to Evaluate Our PII Models
How we use Philter Scope, our open source auditing tool, to evaluate our own PII models, shown on the ph-eye-pii-en name models.
Multilingual Medical PII Redaction: English and French with Philter
How to redact PII and PHI from English and French clinical text with Philter using one engine and a policy per language, with runnable examples.
Cutting False Positives on National IDs with Checksum Validation
Phileas can now validate national and financial IDs by checksum, so a custom identifier rejects format-valid look-alikes and redacts fewer false positives.
philter-sdk-java 2.0.0: Built for Philter 4.0.0
philter-sdk-java 2.0.0 is on Maven Central and targets the Philter 4.0.0 API: API-key auth, full endpoint coverage, and Java 11 bytecode.
Phileas 4.0.0
Phileas 4.0.0 is on Maven Central: authenticated AES-GCM encryption, ReDoS guards, faster span handling, PhiSQL policy input, and a Java 25 build.
How to Redact PII Before Sending to an LLM: Chat, RAG, and AI Agents
Redact PII and PHI before any prompt reaches an LLM, keeping data on your own infrastructure. A self-hosted pipeline for chat, RAG, and AI agents.
Introducing PhiSQL: The Query Language for PII Operations
PhiSQL is a declarative, SQL-like query language for PII privacy operations across the Philterd toolkit. The problem it solves and what ships in v0.1.
PhEye Update: Unified Branch, GPU Support, and Streamlined Testing
PhEye consolidates all model branches into one main branch, adds GPU-accelerated Docker images, and ships a one-command smoke test for every model variant.
Introducing Arbiter: Human-in-the-Loop PII Redaction
Automation handles most of the volume; humans handle the last few percent. Arbiter is the open source review surface that bridges the two, built on Philter.
The TCO of "Free" Cloud PII Redaction: AWS Comprehend, Google DLP, vs Self-Hosted at Scale
Per-character SaaS pricing looks cheap at demo scale and costly in production. A TCO comparison of AWS Comprehend, Google Cloud DLP, and self-hosted Philter.
PII Masking and Redaction in Trino with the Phileas Connector
Mask and redact PII column by column inside Apache Trino with the open source Phileas connector: install it, redact in SQL, and federate across sources.
Redaction for Legal and E-Discovery: Privilege, Rule 9037, and the In-House Counsel's Pipeline
How automated redaction fits legal workflows: court filings, e-discovery production, privilege review, and M&A due diligence for in-house counsel.
The Ethics of Training: Why We Use Synthetic Data
A privacy tool should never be trained on the data it protects. Why Philterd's models are built entirely on synthetic data, and what that means for compliance.
Building a HIPAA-Compliant Medical Chatbot
Why generic RAG chatbots fail HIPAA, and a blueprint for building a medical chatbot that satisfies Safe Harbor at ingestion, retrieval, and inference.
Building a Privacy-Aware RAG System
RAG pipelines have two distinct PII leak vectors: ingestion and inference. A defense-in-depth blueprint with code, using Philter and the Philter AI Proxy.
Redaction for Financial Services: PCI DSS, GLBA, and the Real-World Data Pipeline
A practitioner's guide to redacting NPPI and cardholder data in financial workflows, mapping PCI DSS, GLBA, and state requirements to the Philterd toolkit.
Architecting Privacy in Kafka: Real-Time Redaction for Streaming Data
Three battle-tested patterns for redacting PII inside Apache Kafka pipelines: Phileas as an embedded library, Philter over HTTP, or a Kafka Connect transform.
Beyond Regex: Why General LLMs Fail at PII Discovery
Regex misses context, general LLMs over-redact and burn GPUs. The right answer is hybrid: pattern matching for the deterministic, specialized AI for the rest.
Compliance as Code: Integrating Philter into Your CI/CD Pipeline
Treat data privacy like a unit test. Wire Philter into GitHub Actions, GitLab CI, and pre-commit hooks so PII leaks fail the build, not production.
Migrating from AWS Comprehend to Philter: A Practical Transition Guide
A side-by-side guide for teams migrating PII detection from AWS Comprehend to self-hosted Philter: API translations, code samples, and a shadow-mode cutover.
Redaction for Insurance: Claims, Customer Data, and the State-by-State Patchwork
Insurance carriers sit at the intersection of GLBA, HIPAA, state rules, and the NAIC Model Law. A guide to redacting NPPI and PHI in claims and adjuster notes.
Open Source vs. Black Box: Why You Can't Afford "Trust Me" Privacy
For a CISO, trust me is not a strategy. Why auditable open source is the new enterprise standard for PII redaction, and what it means for compliance.
Automating HIPAA Safe Harbor: A Blueprint for Healthcare Data Pipelines
How the Philterd suite maps to the 18 HIPAA Safe Harbor identifiers, with a deployment blueprint for patient data lakes, research pipelines, and medical RAG.
Privacy Shouldn't Be a Guessing Game: Evaluating Redaction with Philter Scope
Stop hoping your redaction works. Philter Scope turns precision, recall, and F1 into a measurable, auditable health score for any redaction pipeline.
Why API-Based Redaction is a Security Antipattern
Sending sensitive data to a third-party redaction API opens the security holes you are trying to close. Why data sovereignty needs a self-hosted engine.
Redaction for Government and Federal Workloads: FedRAMP, CMMC, ITAR, and the Air-Gap Imperative
Why most commercial PII redaction tools fail federal workloads, and how Philterd's self-hosted architecture maps to FedRAMP, CMMC, ITAR, and air-gapped needs.
Deploying Philter in Air-Gapped Environments
Deploy Philter and the full Philterd toolkit into completely offline VPCs and disconnected cloud regions. No phone-home, no telemetry, no external dependencies.
From Phileas to Philter: The Evolution of Our Open Source Engine
How a focused open source experiment grew into the engine behind a full enterprise PII suite, and why both Phileas and Philter still ship independently.
Redaction for Education: FERPA, Student Records, and Research Data Pipelines
FERPA governs student records but rarely gets the attention HIPAA does. A practitioner's guide for university IT, edtech vendors, and research-data teams.
Snowflake PII Redaction: A Practical Integration Guide
Three production-grade patterns for redacting PII inside Snowflake: external functions, Java UDFs, and ETL-stage redaction, with code and trade-offs.
Redacting PII and PHI in OpenSearch Data Prepper Pipelines
Add a Phileas or Philter redaction step to an OpenSearch Data Prepper pipeline so PII and PHI are removed before data lands in your search index.
Philter 3.1.0
Philter 3.1.0 is now on the AWS, Google Cloud, and Azure marketplaces. Built on Phileas 2.12.0 with filter priorities, zip-code validation, and context windows.
Phileas 2.12.0
Phileas 2.12.0, the popular open source redaction library, is released. A new Philter built on it is coming to the AWS, Google Cloud, and Azure marketplaces.
Why Using an LLM to Redact PII and PHI is a Bad Idea
Lots of posts show how to redact PII and PHI text with a large language model (LLM). Can we really just let an LLM handle it? Here is why that is a bad idea.
Shielding Your Search: Redacting PII and PHI in Elasticsearch with the Search Redact Plugin
The Search Redact plugin for Elasticsearch, built on Phileas, redacts PII and PHI from your search results so sensitive data stays private.
Shielding Your Search: Redacting PII and PHI in OpenSearch with the Search Redact Plugin
The Search Redact plugin for OpenSearch, built on Phileas, redacts PII and PHI from your search results so sensitive data stays private.
Phileas 2.10.0
Phileas 2.10.0 is released. This version removes the commons-csv and guava dependencies, adds a bloom filter, updates pdfbox to 3.0, and adds fixes.
Phileas in Graylog – Removing PII from Logs
Graylog has integrated Phileas, the open source PII and PHI redaction engine, into its log management platform to identify and redact sensitive data in logs.
Phileas 2.9.1
Phileas 2.9.1 is released: a new line separator in LineWidthSplitService, empty ph-eye spans no longer signal failure, and a default PhEyeConfiguration value.
Automatically Redacting PII and PHI from Files in Amazon S3 using Amazon Macie and Philter
Use Amazon Macie to find sensitive data in S3, then automatically redact PII and PHI such as SSNs and phone numbers from those files with Philter.
Philter as an AI Policy Layer
An AI policy layer inspects AI-generated text to prevent sensitive information from being exposed, removing names, addresses, telephone numbers, and more.
Phileas: The Open Source PII and PHI redaction engine
Introducing Phileas, the open source PII and PHI redaction engine, now available under the Apache license on GitHub. It powers both Philter and Phirestream.