Introducing Arbiter: Human-in-the-Loop PII Redaction
Automation handles most of the volume; humans handle the last few percent. Arbiter is the open source review surface that bridges the two, built on Philter.
The TCO of "Free" Cloud PII Redaction: AWS Comprehend, Google DLP, vs Self-Hosted at Scale
Per-character SaaS pricing looks cheap at demo scale and costly in production. A TCO comparison of AWS Comprehend, Google Cloud DLP, and self-hosted Philter.
What is PII? A Practical Guide for Engineers and Compliance Teams
PII is the term everyone uses and few define the same way. A practitioner's guide to what counts as PII, how to find it in real data, and how to handle it.
The Hidden Difficulties of Redacting PDF Documents
PDFs leak redacted text in unexpected ways: invisible text layers, embedded files, and metadata. Why PDF redaction is harder than it looks, with Philter's fix.
Redaction for Legal and E-Discovery: Privilege, Rule 9037, and the In-House Counsel's Pipeline
How automated redaction fits legal workflows: court filings, e-discovery production, privilege review, and M&A due diligence for in-house counsel.
The Ethics of Training: Why We Use Synthetic Data
A privacy tool should never be trained on the data it protects. Why Philterd's models are built entirely on synthetic data, and what that means for compliance.
Building a HIPAA-Compliant Medical Chatbot
Why generic RAG chatbots fail HIPAA, and a blueprint for building a medical chatbot that satisfies Safe Harbor at ingestion, retrieval, and inference.
Building a Privacy-Aware RAG System
RAG pipelines have two distinct PII leak vectors: ingestion and inference. A defense-in-depth blueprint with code, using Philter and the Philter AI Proxy.
Redaction for Financial Services: PCI DSS, GLBA, and the Real-World Data Pipeline
A practitioner's guide to redacting NPPI and cardholder data in financial workflows, mapping PCI DSS, GLBA, and state requirements to the Philterd toolkit.
PII vs PHI vs NPPI: An Engineer's Guide
Three acronyms used interchangeably that shouldn't be. A reference for engineers and compliance leads, with the regulatory and architectural take on each.
Architecting Privacy in Kafka: Real-Time Redaction for Streaming Data
Three battle-tested patterns for redacting PII inside Apache Kafka pipelines: Phileas as an embedded library, Philter over HTTP, or a Kafka Connect transform.
Beyond Regex: Why General LLMs Fail at PII Discovery
Regex misses context, general LLMs over-redact and burn GPUs. The right answer is hybrid: pattern matching for the deterministic, specialized AI for the rest.
Compliance as Code: Integrating Philter into Your CI/CD Pipeline
Treat data privacy like a unit test. Wire Philter into GitHub Actions, GitLab CI, and pre-commit hooks so PII leaks fail the build, not production.
Migrating from AWS Comprehend to Philter: A Practical Transition Guide
A side-by-side guide for teams migrating PII detection from AWS Comprehend to self-hosted Philter: API translations, code samples, and a shadow-mode cutover.
Redaction for Insurance: Claims, Customer Data, and the State-by-State Patchwork
Insurance carriers sit at the intersection of GLBA, HIPAA, state rules, and the NAIC Model Law. A guide to redacting NPPI and PHI in claims and adjuster notes.
Open Source vs. Black Box: Why You Can't Afford "Trust Me" Privacy
For a CISO, trust me is not a strategy. Why auditable open source is the new enterprise standard for PII redaction, and what it means for compliance.
Automating HIPAA Safe Harbor: A Blueprint for Healthcare Data Pipelines
How the Philterd suite maps to the 18 HIPAA Safe Harbor identifiers, with a deployment blueprint for patient data lakes, research pipelines, and medical RAG.
Privacy Shouldn't Be a Guessing Game: Evaluating Redaction with Philter Scope
Stop hoping your redaction works. Philter Scope turns precision, recall, and F1 into a measurable, auditable health score for any redaction pipeline.
Why API-Based Redaction is a Security Antipattern
Sending sensitive data to a third-party redaction API opens the security holes you are trying to close. Why data sovereignty needs a self-hosted engine.
Redaction for Government and Federal Workloads: FedRAMP, CMMC, ITAR, and the Air-Gap Imperative
Why most commercial PII redaction tools fail federal workloads, and how Philterd's self-hosted architecture maps to FedRAMP, CMMC, ITAR, and air-gapped needs.
Deploying Philter in Air-Gapped Environments
Deploy Philter and the full Philterd toolkit into completely offline VPCs and disconnected cloud regions. No phone-home, no telemetry, no external dependencies.
From Phileas to Philter: The Evolution of Our Open Source Engine
How a focused open source experiment grew into the engine behind a full enterprise PII suite, and why both Phileas and Philter still ship independently.
Redaction for Education: FERPA, Student Records, and Research Data Pipelines
FERPA governs student records but rarely gets the attention HIPAA does. A practitioner's guide for university IT, edtech vendors, and research-data teams.
Snowflake PII Redaction: A Practical Integration Guide
Three production-grade patterns for redacting PII inside Snowflake: external functions, Java UDFs, and ETL-stage redaction, with code and trade-offs.
Philter 3.1.0
Philter 3.1.0 is now on the AWS, Google Cloud, and Azure marketplaces. Built on Phileas 2.12.0 with filter priorities, zip-code validation, and context windows.
Why Using an LLM to Redact PII and PHI is a Bad Idea
Lots of posts show how to redact PII and PHI text with a large language model (LLM). Can we really just let an LLM handle it? Here is why that is a bad idea.
Automatically Redacting PII and PHI from Files in Amazon S3 using Amazon Macie and Philter
Use Amazon Macie to find sensitive data in S3, then automatically redact PII and PHI such as SSNs and phone numbers from those files with Philter.
Philter as an AI Policy Layer
An AI policy layer inspects AI-generated text to prevent sensitive information from being exposed, removing names, addresses, telephone numbers, and more.
Redacting Text in Amazon Kinesis Data Firehose
Amazon Kinesis Firehose is a managed streaming service that moves data from sources to destinations like S3 and Redshift. This post redacts PII in that stream.