Introducing Arbiter: Human-in-the-Loop PII Redaction
Automated redaction handles most of the volume; humans handle the last few percent that automation can't. Arbiter is the open source review surface that bridges the two — built on Philter, designed for AI training data and regulated everyday workflows.
The TCO of "Free" Cloud PII Redaction: AWS Comprehend, Google DLP, vs Self-Hosted at Scale
Per-character SaaS pricing looks cheap at demo scale and gets eye-watering at production scale. A worked-example TCO comparison: AWS Comprehend, Google Cloud DLP, and self-hosted Philter on the marketplace.
What is PII? A Practical Guide for Engineers and Compliance Teams
PII is the term everyone uses and few people define the same way. A practitioner's guide to what counts as PII, how to find it in real data, and how to handle it without breaking everything downstream.
The Hidden Difficulties of Redacting PDF Documents
PDFs leak redacted text in ways most people don't anticipate — invisible text layers, embedded files, attached portfolios, metadata, the works. A deep dive into why PDF redaction is harder than it looks, with famous failures and Philter's approach.
Redaction for Legal and E-Discovery: Privilege, Rule 9037, and the In-House Counsel's Pipeline
How automated redaction fits into legal workflows — court filings, e-discovery production, privilege review, and M&A due diligence. With identifier mappings and architectures for in-house counsel and legal-tech teams.
The Ethics of Training: Why We Use Synthetic Data
A privacy tool should never be trained on the very data it's meant to protect. Here's why Philterd's models are built entirely on synthetic data — and what that means for your compliance posture.
Building a HIPAA-Compliant Medical Chatbot
Why generic RAG chatbots fail HIPAA — and a step-by-step blueprint for building a medical chatbot that satisfies Safe Harbor at ingestion, retrieval, and inference. With BAA considerations and a self-hosted-LLM alternative.
Building a Privacy-Aware RAG System
RAG pipelines have two distinct PII leak vectors: ingestion and inference. A defense-in-depth blueprint with code, using Philter, Philter AI Proxy, and the rest of the Philterd toolkit.
Redaction for Financial Services: PCI DSS, GLBA, and the Real-World Data Pipeline
A practitioner's guide to redacting NPPI and cardholder data in financial workflows — mapping PCI DSS, GLBA, and state requirements to the Philterd toolkit. With architecture patterns for call centers, KYC, and log streams.
PII vs PHI vs NPPI: An Engineer's Guide
Three acronyms that get used interchangeably and shouldn't be. A short, definitional reference for engineers and compliance leads, with the regulatory framework and the architectural implication for each.
Architecting Privacy in Kafka: Real-Time Redaction for Streaming Data
Three battle-tested patterns for redacting PII inside Apache Kafka pipelines — using Phileas as an embedded library, Philter over HTTP, or a Kafka Connect transform. With code, deployment notes, and operational guidance.
Beyond Regex: Why General LLMs Fail at PII Discovery
Regex misses context, general LLMs over-redact and burn GPUs. The right answer is hybrid — pattern matching for what's deterministic, specialized AI for what isn't.
Compliance as Code: Integrating Philter into Your CI/CD Pipeline
Treat data privacy like a unit test. How to wire Philter into GitHub Actions, GitLab CI, and pre-commit hooks so PII leaks fail the build instead of surfacing in production.
Migrating from AWS Comprehend to Philter: A Practical Transition Guide
A side-by-side migration guide for teams moving PII detection from AWS Comprehend to self-hosted Philter. API translations, code samples, deployment patterns, and a safe shadow-mode cutover.
Redaction for Insurance: Claims, Customer Data, and the State-by-State Patchwork
Insurance carriers sit at the intersection of GLBA, HIPAA, state insurance rules, and the NAIC Model Law. A practitioner's guide to redacting NPPI and PHI in claims data, adjuster notes, and customer correspondence.
Open Source vs. Black Box: Why You Can't Afford "Trust Me" Privacy
For a CISO, "trust me" is not a strategy. Why auditable open source is the new enterprise standard for PII redaction — and what that means for compliance, vetting, and vendor lock-in.
Automating HIPAA Safe Harbor: A Blueprint for Healthcare Data Pipelines
How the Philterd suite maps directly to the 18 HIPAA Safe Harbor identifiers (45 CFR § 164.514(b)(2)) — with a deployment blueprint for patient data lakes, clinical research pipelines, and medical RAG systems.
Privacy Shouldn't Be a Guessing Game: Evaluating Redaction with Philter Scope
Stop hoping your redaction works. Philter Scope turns precision, recall, and F1 into a measurable, auditable health score for any redaction pipeline.
Why API-Based Redaction is a Security Antipattern
Sending sensitive data to a third-party redaction API creates the security holes you're trying to close. Here's why true data sovereignty requires a self-hosted engine — and how Philter delivers it.
Redaction for Government and Federal Workloads: FedRAMP, CMMC, ITAR, and the Air-Gap Imperative
Why most commercial PII redaction tools fail to qualify for federal workloads — and how Philterd's open source, self-hosted architecture maps cleanly to FedRAMP, CMMC, ITAR, and air-gapped deployment requirements.
Deploying Philter in Air-Gapped Environments
How to deploy Philter — and the rest of the Philterd toolkit — into completely offline VPCs and disconnected cloud regions. No phone-home, no hidden telemetry, no external dependencies.
From Phileas to Philter: The Evolution of Our Open Source Engine
How a focused open source experiment grew into the engine behind a full enterprise PII suite — and why both Phileas and Philter still ship independently.
Redaction for Education: FERPA, Student Records, and Research Data Pipelines
FERPA governs student records but rarely gets the architectural attention HIPAA does. A practitioner's guide for university IT, edtech vendors, and research-data teams managing student PII at scale.
Snowflake PII Redaction: A Practical Integration Guide
Three production-grade patterns for redacting PII inside Snowflake: external functions, Java UDFs, and ETL-stage redaction. With code, performance trade-offs, and when to pick each.
Philter 3.1.0
Philter 3.1.0 is now available on the AWS, Google Cloud, and Azure marketplaces. Built on Phileas 2.12.0 with filter priorities, zip-code validation, and per-filter context window sizes.
Why Using an LLM to Redact PII and PHI is a Bad Idea
We have seen a lot – and you probably have to – posts on various social media and blogging platforms showing how you can redact text using a large language model (LLM). They present a fairly simple solution to the complex problem of redaction. Can we really just let an LLM handle our text redaction…
Automatically Redacting PII and PHI from Files in Amazon S3 using Amazon Macie and Philter
Amazon Macie is “a data security service that discovers sensitive data using machine learning and pattern matching.” With Amazon Macie you can find potentially sensitive information in files in your Amazon S3 buckets, but what do you do when Amazon Macie finds a file that contains an SSN, phone number, or other piece of sensitive information?…
Philter as an AI Policy Layer
A policy layer is an important part of every source of AI-generated text. An AI policy layer is an important part of every source of AI-generated text because it inspects the AI-generated text to prevent sensitive information from being exposed. A policy layer can help remove information such as names, addresses, and telephone numbers from…
Redacting Text in Amazon Kinesis Data Firehose
Amazon Kinesis Firehose is a managed streaming service designed to take large amounts of data from one place to another. For example, you can take data from sources such as Amazon CloudWatch, AWS IoT, and custom applications using the AWS SDK to destinations Amazon S3, Amazon Redshift, Amazon Elasticsearch, and other services. In this post…