Talk to the Team

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer email? support@philterd.ai

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

Score your redaction policies

Philter Scope

Philter Scope is a standalone audit tool that scores redaction policies against gold-standard test data. Stop guessing whether a policy change made the pipeline better. Measure it, version it, and fail the build when it regresses.

Why Philter Scope

Reproducible benchmarks

Same test set, same metrics, every run. Two engineers comparing two policies see the same numbers. No more debates about whether the new rules are actually better.

Gold-standard comparison

Annotate a representative sample of your real text once. Philter Scope compares any policy output against that ground truth and reports precision, recall, and F1 per entity type, along with an entity type confusion matrix showing where detectors misclassify.

Per-entity breakdown

Aggregate scores hide problems. Philter Scope reports per-entity-type metrics so you can see exactly which detectors are weakest and where the next tuning pass should land.

CI integration

Run it as a step in your CI pipeline. Fail the build when precision or recall regresses below a threshold; catch policy regressions before they reach production.

Audit artifact

The evaluation report is the artifact regulators and auditors actually want to see. Demonstrate that your redaction pipeline is verifiably correct, not just "trust us, it works."

Open source

Pair with Phileas and Philter, or use against any redaction output. The evaluation logic is open: your QA team can read every line of the code that scores them.

See it in action

The Philter Scope dashboard breaks down precision, recall, and F1 by entity type, so you can see exactly where your policy is strong and where it needs tuning.

Philter Scope dashboard showing precision, recall, and F1 scores by entity type
Development moves quickly. Screenshots may not always reflect the current version.

Why measuring redaction matters

Redaction feels binary. The data went in, redacted data came out, and a spot check of a few documents looked clean. But redaction is not binary, it is statistical: every policy decides, entity by entity, what to catch and what to let through. The moment you change a rule, add a detector, or adopt a new model, you have placed a bet on thousands of decisions you will never read by hand. "It looked right on a few examples" is a hope, not an audit artifact. You cannot tune what you cannot measure.

A single headline accuracy number is almost as dangerous as no number at all, because aggregate scores hide exactly the failures that matter. A policy can report 98% overall while quietly missing most medical record numbers or over-redacting the dates your analysts depend on. That is why Philter Scope reports precision, recall, and F1 per entity type against a gold-standard set you annotate once. Precision tells you how much of what you redacted was actually sensitive, so you can see where the policy is destroying useful data. Recall tells you how much of the real PII you caught, so you can see exactly where data is leaking. Per entity, those two numbers turn tuning from guesswork into a targeted decision.

Measurement is what makes a policy safe to change

Policies drift. Models get swapped, input shapes change, and someone tweaks a rule to fix one document. Without a gate, a regression that re-exposes Social Security numbers ships as silently as any other untested change. Scoring a policy in CI turns that risk into a build you can fail: set a floor on recall for the entity types you care about, and a regression is caught before it reaches production, the same way a broken unit test is. The policies you run through Phileas and Philter become something you can version, review, and verify rather than something you hope still works.

Measurement is also what an auditor will accept. "Trust us, it works" is not a control. A reproducible report that scores your real redaction output against ground truth is the evidence regulators actually ask for, and it is the difference between asserting your pipeline is correct and proving it. For a deeper walk through the three metrics and how to read them, see Privacy shouldn't be a guessing game.

Ready to use Philter Scope?

Three ways to get going: deploy the open source yourself, spin it up from a cloud marketplace, or work with our team directly. Pick the path that fits.

See your options