Talk to an Expert

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

Your Cloud. Your Data.
Zero-Trust PII Redaction.

Open source PII redaction that runs entirely inside your cloud — for healthcare, finance, legal, and government workloads where a leak isn't an option.

Deploy →

In production since 2017. Or explore the toolkit ↓

Trusted in
  • Healthcare
  • Finance
  • Legal
  • AI & Machine Learning
  • Government
  • Contact Centers
  • Insurance
  • AI Training Data
  • Education

The World-Class Open Source Toolkit for PII Privacy

A complete stack for finding, redacting, monitoring, and auditing sensitive data — from low-level libraries to turnkey services. Each project is released under the permissive and business-friendly Apache license and developed in the open on GitHub.

Core redaction

The engine and API that find and redact PII in text.

AI models & LLM guardrails

Trained models and drop-in proxies for AI workloads.

Discovery & monitoring

Find where PII lives and watch where it flows.

Review, policy & operations

Human review, policy authoring, benchmarking, and privacy analytics.

Human review Java

Arbiter

Human-in-the-loop PII redaction. Search, review, and override automated detection decisions with structured exemption codes — built for AI training-data prep and regulated everyday workflows.

Star 1

Not sure which one to start with? Walk through the PII redaction journey →

Client SDKs for Java, .NET, and Go are available alongside the rest of the toolkit at github.com/philterd.

Privacy AI

Specialized Large Language Models trained exclusively for high-accuracy PII discovery, classification, and redaction. An effective PII strategy combines pattern matching for structured data with AI for the unstructured text where rigid patterns fall short.

Pattern-based identification

Predefined character sequences detect structured data like credit-card numbers, SSNs, and email addresses. Fast, predictable, and lightweight to run — but rigid in the face of unstructured text.

LLM-based identification

Trained models read the linguistic context and intent around sensitive data. Highly accurate, adaptable across languages, and effective on the unstructured text where patterns alone fail.

Hybrid by design

Pattern matching provides a high-speed foundation; LLMs add the intelligent oversight unstructured text demands. The two methods complement each other inside every Philterd deployment.

Curious how the models are trained and benchmarked? Read about our hybrid approach →

The Three Pillars of Privacy

Everything we build sits on top of three commitments: data stays with you, the source code stays open, and the AI underneath stays purpose-built for the job.

Data Sovereignty

Philter and the rest of the Philterd toolkit run inside your cloud. Your data never leaves your perimeter, never reaches a third-party API, and never lands in someone else's logs.

Open Source Integrity

Transparency is the only way to verify privacy software. Our core engine is Apache 2.0 licensed — your engineers can read every line, audit every decision, and extend the stack on their own terms.

Purpose-Built AI

Generic LLMs make poor privacy filters. We train and ship specialized NLP and deep-learning models built specifically for PII and PHI detection — accurate, tunable, and operationally affordable at scale.

Compliance and Trust

  • HIPAA
  • EU GDPR Compliant
  • CCPA Compliant

Philterd provides a zero-trust architecture for HIPAA, GDPR, and CCPA compliance. The discovery engine operates entirely within your infrastructure — 100% data sovereignty, no external API dependencies, no third-party data training.

To satisfy HIPAA Safe Harbor requirements, we pair high-speed pattern matching for structured identifiers with specialized AI models for everything else, capturing all 18 protected identifiers under 45 CFR § 164.514. Healthcare and life-sciences organizations can automate de-identification across massive datasets while preserving the utility the data needs for research and innovation.

Need help mapping your HIPAA, GDPR, or PCI posture to a Philter deployment? Get an architecture review → · See the full compliance breakdown →

Three ways to get started

Same redaction engine, three paths. Pick the one that fits your team.

Free forever

Open Source

$0 · Open source

Run the entire Philterd toolkit yourself. Full source on GitHub — no license keys, no usage caps, no commercial review.

  • All 9 tools, full source code
  • User's guides and reference docs
  • Community support via GitHub issues
  • Every update and new release

Best for: Engineering-led teams who want to own every layer.

Engagement-based

Engaged

Custom

Work directly with the people who built the toolkit. Custom NLP models, privacy architecture, embedded engineering, and production deployment with full handoff.

  • Custom NLP model training
  • Privacy architecture review
  • Embedded engineering
  • Deployment + knowledge transfer

Best for: Healthcare, finance, and government workloads with custom requirements.

See the full pricing breakdown →

From the blog

Practical posts on PII redaction, AI privacy, and self-hosted compliance.

· Philter, Redaction

Introducing Arbiter: Human-in-the-Loop PII Redaction

Automated redaction handles most of the volume; humans handle the last few percent that automation can't. Arbiter is the open source review surface that bridges the two — built on Philter, designed for AI training data and regulated everyday workflows.

Read post →

Read all posts →

For integrators & system builders

Building for someone else?

Philter is the redaction layer integrators bundle into client deliverables. Deploys in the client's cloud, operated by you, no per-seat license, no vendor sub-license to negotiate. Reference architectures for the patterns clients actually buy.

For integrators →

New · Open Source

Introducing Arbiter

Arbiter is the newest addition to our open source toolkit for PII privacy — a human-in-the-loop review surface for redaction pipelines. Reviewers see every detection in context, accept or override automated decisions, and apply structured exemption codes that flow into your audit trail. Built on Philter; designed for AI training-data prep and regulated everyday workflows.

Consulting Services

Accelerate compliance and reduce leak risk by working directly with the creators of Philter. We design, build, and deploy the privacy infrastructure your team will own — not a black box you have to renew every year.

Privacy Architecture

We design end-to-end PII protection for your cloud and AI workloads — data flows, redaction layers, audit trails, and the guardrails that keep them aligned with HIPAA, GDPR, and CCPA.

Custom NLP Models

Off-the-shelf models miss the entities that matter most in your domain. We train specialized PII/PHI detectors on your data, evaluated against precision and recall you can measure.

See all consulting services → · Have a specific project in mind? Schedule a 30-min call →

Ready to lock down your data?

Tell us about your stack and the privacy problems you're trying to solve. We'll get back to you within one business day.