Blog

PII & PHI Redaction Guides and Best Practices

Practical guides on PII and PHI redaction: self-hosted pipelines, LLM and RAG privacy, HIPAA and GDPR, and de-identification, from the Philter maintainers.

July 16, 2026 Philter · Compliance
Designing a HIPAA Architecture on AWS
A practical guide to HIPAA-eligible workloads on AWS: what the BAA covers, where the eligible-services list hurts you, and the LLM blind spot AWS leaves open.
- Hipaa
- Aws
- Healthcare
- Safe-Harbor
- Architecture
July 10, 2026 Philter · Redaction
Introducing Philter Router: The Right Policy for Every File
A folder of documents is rarely uniform. Philter Router reads each file and sends it to the redaction policy and Philter engine built for it.
- Philter-Router
- Routing
- Announcement
July 9, 2026 Philter Desktop · Redaction
Introducing Philter Desktop: Redact Documents Right on Your Windows PC
Philter Desktop is a free Windows app that redacts PII from your PDF, Word, Excel, and email files, right on your computer. Nothing is uploaded.
- Philter-Desktop
- Redaction
- Windows
- Offline
- Announcement
July 8, 2026 Redaction · Architecture
PII Redaction Is a Lifecycle, Not a One-Time Fix
Redacting PII once is not enough. Data keeps flowing and policies drift. Treat privacy as a continuous lifecycle: discover, redact, review, monitor, repeat.
- Lifecycle
- Discovery
- Monitoring
- Architecture
July 7, 2026 Community · Open Source
Our Project Issues Are Now Public
Philterd's project issues have moved from a private backlog into each project's public GitHub repository. Here is what changed, what is next, and how to get involved.
- Community
- Contributing
- Open-Source
June 25, 2026 Phileas · Redaction
The Phileas Trino Connector Is Now Open Source
The Phileas Trino connector is now open source under Apache 2.0. How to install and register the plugin, a worked SQL redaction example, and ops details.
- Trino
- Integration
- Sql
- Announcement
- Open-Source
June 24, 2026 AI · Models · PhEye
How to Tell if a PII Redaction Model Is Any Good
A model count says nothing about quality. What proves a PII redaction model works: held-out evaluation, precision and recall, and an auditable model card.
- Pii
- Evaluation
- Ph-Eye
- Ner
- Methodology
June 19, 2026 Philter · Redaction
Precision, Recall, or F1: Which Redaction Metric Matters Most in Your Industry
The right redaction metric depends on your industry. See how healthcare, legal, finance, marketing, and research each prioritize precision, recall, or F1.
- Benchmarking
- Philter-Scope
- Metrics
- Precision
- Recall
- F1
June 18, 2026 Philter · Redaction
Redact PII From Your AI Agent in One Block of Config
Philter MCP exposes PII and PHI redaction as Model Context Protocol tools that Claude Desktop, Claude Code, Cursor, and other MCP clients call mid-conversation.
- Mcp
- Model-Context-Protocol
- Claude
- Ai-Agents
- Announcement
June 18, 2026 Philter · AWS
How to Deploy Philter in 5 Minutes on the AWS Marketplace
Subscribe, launch into your own VPC, and redact PII in minutes. A step-by-step guide to deploying the self-hosted Philter API from the AWS Marketplace.
- Aws
- Marketplace
- Deployment
- Ec2
- Self-Hosted
June 17, 2026 PhEye · AI · Models
How We Built PhEye's PII Name Models
Four open GLiNER name detectors for PhEye, from a 90 MB xsmall to a high-capacity large: why we built name-only models, how we trained them, and how they fit.
- Ph-Eye
- Gliner
- Pii
- Ner
- Announcement
June 17, 2026 Philter Scope · AI · Models
How We Use Philter Scope to Evaluate Our PII Models
How we use Philter Scope, our open source auditing tool, to evaluate our own PII models, shown on the ph-eye-pii-en name models.
- Philter-Scope
- Ph-Eye
- Evaluation
- Pii
June 16, 2026 Philter · Healthcare
Multilingual Medical PII Redaction: English and French with Philter
How to redact PII and PHI from English and French clinical text with Philter using one engine and a policy per language, with runnable examples.
- Multilingual
- French
- Healthcare
- Phi
- Pii
June 15, 2026 Phileas · Redaction
Cutting False Positives on National IDs with Checksum Validation
Phileas can now validate national and financial IDs by checksum, so a custom identifier rejects format-valid look-alikes and redacts fewer false positives.
- Pii
- Validation
- National-Ids
June 12, 2026 Philter · Redaction
philter-sdk-java 2.0.0: Built for Philter 4.0.0
philter-sdk-java 2.0.0 is on Maven Central and targets the Philter 4.0.0 API: API-key auth, full endpoint coverage, and Java 11 bytecode.
- Release
- Java
- Sdk
June 10, 2026 Phileas
Phileas 4.0.0
Phileas 4.0.0 is on Maven Central: authenticated AES-GCM encryption, ReDoS guards, faster span handling, PhiSQL policy input, and a Java 25 build.
- Release
- Security
- Performance
June 7, 2026 AI · Philter · Redaction
How to Redact PII Before Sending to an LLM: Chat, RAG, and AI Agents
Redact PII and PHI before any prompt reaches an LLM, keeping data on your own infrastructure. A self-hosted pipeline for chat, RAG, and AI agents.
- Llm
- Ai-Proxy
- Rag
- Agents
- Python
- Gdpr
- Compliance
May 28, 2026 Phileas · Redaction
Introducing PhiSQL: The Query Language for PII Operations
PhiSQL is a declarative, SQL-like query language for PII privacy operations across the Philterd toolkit. The problem it solves and what ships in v0.1.
- Phisql
- Sql
- Policy
- Announcement
May 26, 2026 PhEye · Engineering
PhEye Update: Unified Branch, GPU Support, and Streamlined Testing
PhEye consolidates all model branches into one main branch, adds GPU-accelerated Docker images, and ships a one-command smoke test for every model variant.
- Pheye
- Gpu
- Docker
- Releases
May 16, 2026 Philter · Redaction
Introducing Arbiter: Human-in-the-Loop PII Redaction
Automation handles most of the volume; humans handle the last few percent. Arbiter is the open source review surface that bridges the two, built on Philter.
- Arbiter
- Human-in-the-Loop
- Ai-Training
May 14, 2026 AWS · Philter
The TCO of "Free" Cloud PII Redaction: AWS Comprehend, Google DLP, vs Self-Hosted at Scale
Per-character SaaS pricing looks cheap at demo scale and costly in production. A TCO comparison of AWS Comprehend, Google Cloud DLP, and self-hosted Philter.
- Tco
- Comprehend
- Gcp
May 3, 2026 Phileas · Integrations
PII Masking and Redaction in Trino with the Phileas Connector
Mask and redact PII column by column inside Apache Trino with the open source Phileas connector: install it, redact in SQL, and federate across sources.
- Trino
- Integration
- Sql
April 30, 2026 Philter · Redaction
Redaction for Legal and E-Discovery: Privilege, Rule 9037, and the In-House Counsel's Pipeline
How automated redaction fits legal workflows: court filings, e-discovery production, privilege review, and M&A due diligence for in-house counsel.
- Legal
- E-Discovery
- Privilege
April 19, 2026 AI · Philter
The Ethics of Training: Why We Use Synthetic Data
A privacy tool should never be trained on the data it protects. Why Philterd's models are built entirely on synthetic data, and what that means for compliance.
- Ai-Ethics
- Synthetic-Data
- Eu-Ai-Act
April 14, 2026 AI · LLMs · Philter
Building a HIPAA-Compliant Medical Chatbot
Why generic RAG chatbots fail HIPAA, and a blueprint for building a medical chatbot that satisfies Safe Harbor at ingestion, retrieval, and inference.
- Hipaa
- Healthcare
- Rag
April 8, 2026 AI · LLMs · Philter
Building a Privacy-Aware RAG System
RAG pipelines have two distinct PII leak vectors: ingestion and inference. A defense-in-depth blueprint with code, using Philter and the Philter AI Proxy.
- Rag
- Security
- Vector-Search
March 25, 2026 Philter · Redaction
Redaction for Financial Services: PCI DSS, GLBA, and the Real-World Data Pipeline
A practitioner's guide to redacting NPPI and cardholder data in financial workflows, mapping PCI DSS, GLBA, and state requirements to the Philterd toolkit.
- Finance
- Pci-Dss
- GLBA
March 11, 2026 Phileas · Philter · Integrations
Architecting Privacy in Kafka: Real-Time Redaction for Streaming Data
Three battle-tested patterns for redacting PII inside Apache Kafka pipelines: Phileas as an embedded library, Philter over HTTP, or a Kafka Connect transform.
- Kafka
- Streaming
- Kinesis
March 4, 2026 AI · LLMs · Philter
Beyond Regex: Why General LLMs Fail at PII Discovery
Regex misses context, general LLMs over-redact and burn GPUs. The right answer is hybrid: pattern matching for the deterministic, specialized AI for the rest.
- Hybrid
- Nlp
- Regex
February 25, 2026 Philter · Integrations
Compliance as Code: Integrating Philter into Your CI/CD Pipeline
Treat data privacy like a unit test. Wire Philter into GitHub Actions, GitLab CI, and pre-commit hooks so PII leaks fail the build, not production.
- Ci-Cd
- Devops
- Automation
February 18, 2026 AWS · Philter · Integrations
Migrating from AWS Comprehend to Philter: A Practical Transition Guide
A side-by-side guide for teams migrating PII detection from AWS Comprehend to self-hosted Philter: API translations, code samples, and a shadow-mode cutover.
- Migration
- Comprehend
- Tco
February 12, 2026 Philter · Redaction
Redaction for Insurance: Claims, Customer Data, and the State-by-State Patchwork
Insurance carriers sit at the intersection of GLBA, HIPAA, state rules, and the NAIC Model Law. A guide to redacting NPPI and PHI in claims and adjuster notes.
- Insurance
- GLBA
- Claims
February 5, 2026 AI · Philter
Open Source vs. Black Box: Why You Can't Afford "Trust Me" Privacy
For a CISO, trust me is not a strategy. Why auditable open source is the new enterprise standard for PII redaction, and what it means for compliance.
- Open-Source
- Transparency
- Ciso
January 25, 2026 Philter · Redaction
Automating HIPAA Safe Harbor: A Blueprint for Healthcare Data Pipelines
How the Philterd suite maps to the 18 HIPAA Safe Harbor identifiers, with a deployment blueprint for patient data lakes, research pipelines, and medical RAG.
- Hipaa
- Healthcare
- Safe-Harbor
January 18, 2026 Philter · Redaction
Privacy Shouldn't Be a Guessing Game: Evaluating Redaction with Philter Scope
Stop hoping your redaction works. Philter Scope turns precision, recall, and F1 into a measurable, auditable health score for any redaction pipeline.
- Benchmarking
- Audit
- Philter-Scope
January 17, 2026 AI · Philter · Redaction
Why API-Based Redaction is a Security Antipattern
Sending sensitive data to a third-party redaction API opens the security holes you are trying to close. Why data sovereignty needs a self-hosted engine.
- Security
- Zero-Trust
- Ciso
January 9, 2026 Philter
Redaction for Government and Federal Workloads: FedRAMP, CMMC, ITAR, and the Air-Gap Imperative
Why most commercial PII redaction tools fail federal workloads, and how Philterd's self-hosted architecture maps to FedRAMP, CMMC, ITAR, and air-gapped needs.
- Government
- Fedramp
- Cmmc
December 27, 2025 Philter · AWS
Deploying Philter in Air-Gapped Environments
Deploy Philter and the full Philterd toolkit into completely offline VPCs and disconnected cloud regions. No phone-home, no telemetry, no external dependencies.
- Air-Gapped
- Government
- Security
December 19, 2025 Phileas · Philter
From Phileas to Philter: The Evolution of Our Open Source Engine
How a focused open source experiment grew into the engine behind a full enterprise PII suite, and why both Phileas and Philter still ship independently.
- Open-Source
- History
- Nlp
December 14, 2025 Philter · Redaction
Redaction for Education: FERPA, Student Records, and Research Data Pipelines
FERPA governs student records but rarely gets the attention HIPAA does. A practitioner's guide for university IT, edtech vendors, and research-data teams.
- Education
- FERPA
- Student-Records
December 6, 2025 Integrations · Philter
Snowflake PII Redaction: A Practical Integration Guide
Three production-grade patterns for redacting PII inside Snowflake: external functions, Java UDFs, and ETL-stage redaction, with code and trade-offs.
- Snowflake
- Integration
- Sql
November 18, 2025 Phileas · Philter · Integrations
Redacting PII and PHI in OpenSearch Data Prepper Pipelines
Add a Phileas or Philter redaction step to an OpenSearch Data Prepper pipeline so PII and PHI are removed before data lands in your search index.
- Opensearch
- Data-Prepper
- Search-Index
March 23, 2025 Philter
Philter 3.1.0
Philter 3.1.0 is now on the AWS, Google Cloud, and Azure marketplaces. Built on Phileas 2.12.0 with filter priorities, zip-code validation, and context windows.
- Release
- Marketplace
March 20, 2025 Phileas
Phileas 2.12.0
Phileas 2.12.0, the popular open source redaction library, is released. A new Philter built on it is coming to the AWS, Google Cloud, and Azure marketplaces.
- Release
February 17, 2025 LLMs · Philter
Why Using an LLM to Redact PII and PHI is a Bad Idea
Lots of posts show how to redact PII and PHI text with a large language model (LLM). Can we really just let an LLM handle it? Here is why that is a bad idea.
- Llm
- Antipattern
- Hybrid
January 11, 2025 Phileas · Redaction · Search
Shielding Your Search: Redacting PII and PHI in Elasticsearch with the Search Redact Plugin
The Search Redact plugin for Elasticsearch, built on Phileas, redacts PII and PHI from your search results so sensitive data stays private.
- Elasticsearch
- Search-Redact
- Search-Index
January 10, 2025 Phileas · Redaction · Search
Shielding Your Search: Redacting PII and PHI in OpenSearch with the Search Redact Plugin
The Search Redact plugin for OpenSearch, built on Phileas, redacts PII and PHI from your search results so sensitive data stays private.
- Opensearch
- Search-Redact
- Search-Index
January 6, 2025 Phileas
Phileas 2.10.0
Phileas 2.10.0 is released. This version removes the commons-csv and guava dependencies, adds a bloom filter, updates pdfbox to 3.0, and adds fixes.
- Release
December 1, 2024 Integrations · Phileas
Phileas in Graylog – Removing PII from Logs
Graylog has integrated Phileas, the open source PII and PHI redaction engine, into its log management platform to identify and redact sensitive data in logs.
- Graylog
- Logs
- Integration
November 27, 2024 Phileas
Phileas 2.9.1
Phileas 2.9.1 is released: a new line separator in LineWidthSplitService, empty ph-eye spans no longer signal failure, and a default PhEyeConfiguration value.
- Release
November 17, 2024 AWS · Integrations · Philter · Redaction
Automatically Redacting PII and PHI from Files in Amazon S3 using Amazon Macie and Philter
Use Amazon Macie to find sensitive data in S3, then automatically redact PII and PHI such as SSNs and phone numbers from those files with Philter.
- Aws
- S3
- Macie
October 10, 2024 AI · Philter
Philter as an AI Policy Layer
An AI policy layer inspects AI-generated text to prevent sensitive information from being exposed, removing names, addresses, telephone numbers, and more.
- Llm
- Ai-Ethics
- Policy
May 22, 2023 Phileas · Redaction
Phileas: The Open Source PII and PHI redaction engine
Introducing Phileas, the open source PII and PHI redaction engine, now available under the Apache license on GitHub. It powers both Philter and Phirestream.
- Open-Source
- Announcement
- History