How to Tell if a PII Redaction Model Is Any Good
A model count says nothing about quality. What proves a PII redaction model works: held-out evaluation, precision and recall, and an auditable model card.
How We Built PhEye's PII Name Models
Four open GLiNER name detectors for PhEye, from a 90 MB xsmall to a high-capacity large: why we built name-only models, how we trained them, and how they fit.
How We Use Philter Scope to Evaluate Our PII Models
How we use Philter Scope, our open source auditing tool, to evaluate our own PII models, shown on the ph-eye-pii-en name models.
How to Redact PII Before Sending to an LLM: Chat, RAG, and AI Agents
Redact PII and PHI before any prompt reaches an LLM, keeping data on your own infrastructure. A self-hosted pipeline for chat, RAG, and AI agents.
Prompt Engineering for Privacy: Practical Patterns for Not Leaking PII
Every prompt sent to an LLM is a data egress point. Six concrete patterns for structuring prompts, redacting inputs, and scanning outputs so PII doesn't leak.
PII in Vector Embeddings: A Defense Guide
Embeddings look like just numbers, but research shows they are partially invertible. A practical defense guide for vector stores against PII recovery attacks.
The Ethics of Training: Why We Use Synthetic Data
A privacy tool should never be trained on the data it protects. Why Philterd's models are built entirely on synthetic data, and what that means for compliance.
Building a HIPAA-Compliant Medical Chatbot
Why generic RAG chatbots fail HIPAA, and a blueprint for building a medical chatbot that satisfies Safe Harbor at ingestion, retrieval, and inference.
Building a Privacy-Aware RAG System
RAG pipelines have two distinct PII leak vectors: ingestion and inference. A defense-in-depth blueprint with code, using Philter and the Philter AI Proxy.
Beyond Regex: Why General LLMs Fail at PII Discovery
Regex misses context, general LLMs over-redact and burn GPUs. The right answer is hybrid: pattern matching for the deterministic, specialized AI for the rest.
Open Source vs. Black Box: Why You Can't Afford "Trust Me" Privacy
For a CISO, trust me is not a strategy. Why auditable open source is the new enterprise standard for PII redaction, and what it means for compliance.
Why API-Based Redaction is a Security Antipattern
Sending sensitive data to a third-party redaction API opens the security holes you are trying to close. Why data sovereignty needs a self-hosted engine.
Using an LLM or Pattern-based Rules for PII/PHI Redaction
Comparing the two main approaches to redacting PII and PHI: an LLM versus pattern-based rules. How each handles accuracy, cost, and GDPR or HIPAA compliance.
Philter as an AI Policy Layer
An AI policy layer inspects AI-generated text to prevent sensitive information from being exposed, removing names, addresses, telephone numbers, and more.