Migration guide

Migrate from Microsoft Presidio to Philter

Teams move from Presidio to Philter when their prototype needs to be productionized, when they need domain-tuned NLP models rather than generic spaCy, or when they need a policy engine richer than Presidio's recognizer registry. This guide covers the concept mapping, the migration steps, and the operational differences.

Deploy Philter in 5 minutes

Why teams migrate

The reasons teams give for migrating off Presidio, ordered by how often we hear them.

Production hardening

Presidio is a strong Python prototyping framework. Going to production (a stable API, HA deployment, monitoring, audit logs, marketplace billing) is work teams would rather not own. Philter ships as a turnkey API with the operational surface already built.

Domain-specific accuracy

Presidio's default analyzers use generic spaCy or transformer models. Philter ships purpose-trained lenses for healthcare and other domains, with measurably higher precision and recall on those text types.

Richer policy engine

Presidio's AnalyzerEngine + AnonymizerEngine covers detection and replacement. Philter adds severity scoring, conditional rules, format-preserving encryption, synthetic-value replacement, and per-entity strategies that go beyond Presidio's anonymizers.

Concept mapping

How Presidio concepts translate to Philter equivalents. The mapping is mostly direct; Philter's engine is a superset of Presidio's recognizer model.

Microsoft Presidio	Philter	Notes
`AnalyzerEngine`	Philter detection (regex + dictionaries + PhEye lenses)	Same role: find entities in text. Philter's pattern layer covers Presidio's regex recognizers; PhEye covers Presidio's NLP-based recognizers.
`AnonymizerEngine` + operators	Filter strategies (mask, redact, encrypt, FPE, replace, abbreviate), expressed in policy JSON or one line of PhiSQL	Each Presidio operator maps to a Phileas strategy. Phileas adds format-preserving encryption, date-shift, and consistent pseudonymization, and PhiSQL lets you write the rule declaratively instead of building an operator config in code.
Built-in recognizers (PERSON, EMAIL, SSN, etc.)	Default policy entities	Direct one-to-one mapping for most common entity types. Philter ships additional ones (medical record numbers, custom identifiers) out of the box.
Custom `PatternRecognizer`	Custom identifier definitions in policy JSON	Define your own regex, dictionaries, or identifier patterns in the policy file. No Python class to subclass.
Context-aware recognizers (with NLP)	PhEye lenses (purpose-trained models)	Philter's lenses are purpose-trained for PII/PHI detection, not generic NER. Healthcare and other domain lenses are available out of the box.
Operator config (mask, redact, replace, hash)	Per-entity filter strategy in policy	Configured per entity type in the policy JSON. Conditional rules and severity scoring give you finer control.
`presidio-analyzer` Python package	Philter API (HTTP) or Phileas library	Philter is a turnkey HTTP service. If you want an embedded library (the closest like-for-like with Presidio), use Phileas in Java, Python, or .NET.
Docker images on quay.io	AWS / GCP / Azure marketplace or self-built container	Philter is available on the cloud marketplaces for one-click deploy, or as a container image for custom builds.

Migration steps

A safe migration runs Philter in parallel with Presidio against a sample of production text, validates parity, then cuts over. Most teams complete the migration in one to three weeks.

Catalog your recognizers
List every recognizer your Presidio deployment uses: built-ins, custom PatternRecognizers, NLP-based ones. Note the operators (mask, redact, replace) applied to each. This becomes your initial Philter policy.
Translate the recognizers to a Philter policy
Pattern recognizers become regex or identifier entries in the policy JSON. NLP recognizers map to PhEye lens entries. Operators map to per-entity filter strategies. Use the Redaction Policy Editor to build this interactively.
Deploy Philter alongside Presidio
Deploy Philter from a cloud marketplace or as a container in your existing infrastructure. Keep Presidio running. No application code changes yet.
Run shadow mode
For a sample of production traffic, send text to both Presidio and Philter. Compare entity detections side by side. Tune the Philter policy to close any meaningful gaps, especially for domain-specific entities where PhEye lenses should outperform generic NER.
Cut over one integration at a time
Switch one application or pipeline at a time from Presidio to Philter. Monitor entity-type counts and processing latency. The Philter API and the Presidio HTTP wrapper expose similar input/output shapes, so the application-code change is usually a few lines.
Decommission Presidio
Once all integrations are stable on Philter, remove the Presidio service and its dependencies. The maintenance burden goes with it.

Architecture changes

Presidio is typically deployed as two Python services (analyzer and anonymizer) that your application calls. Philter is a single container that exposes a unified redaction API. For HA, run two or more Philter instances behind an internal load balancer. The PhEye model server runs as a sidecar or shared service that Philter calls for NLP-based detection.

Cost comparison

Presidio is free open source; the cost is your operational time to deploy, monitor, scale, and maintain it. Philter on the cloud marketplaces is $0.49/hr per instance, which covers a turnkey API with vendor support, marketplace billing, and the PhEye lens models. For most teams, the cost trade-off comes out in favor of Philter once you include engineering time spent operating Presidio in production. The open-source Phileas library is also free if you want to embed the engine directly without running a service.

Common pitfalls

Direct port of recognizers without re-tuning. Presidio's NLP recognizers are tuned for generic NER. Translating them to PhEye lenses gives you the chance to use the purpose-trained models instead. Don't reproduce Presidio's accuracy when Philter's can do better.
Skipping the policy engine features. If you port Presidio's recognizers one-to-one and stop, you miss Philter's severity scoring, conditional rules, and format-preserving encryption. Audit downstream consumers and adjust the policy to use the richer strategies where they help.
Underestimating the operational delta. Presidio in production usually has a long tail of operational work: scaling decisions, model updates, monitoring, security patches. Migrating to Philter shifts that work to the vendor. Plan for what your team will do with the freed-up time.

Before and after: the same redaction in code

Because Presidio is a library, the most direct migration target is Phileas , the embeddable engine under Philter. Here is the same job (detect an email and an SSN, then de-identify both) in Presidio and in Phileas, side by side.

Presidio (Python)

In Presidio you call the analyzer, then build an operator config for the anonymizer:

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

text = "Contact john@example.com or call about SSN 123-45-6789."
results = analyzer.analyze(text=text, language="en")

anonymized = anonymizer.anonymize(
    text=text,
    analyzer_results=results,
    operators={
        "EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "<EMAIL>"}),
        "US_SSN": OperatorConfig("replace", {"new_value": "<SSN>"}),
    },
)
print(anonymized.text)
# Contact <EMAIL> or call about SSN <SSN>.

Phileas (Python)

In Phileas the same rules live in a policy, and the engine handles detection and manipulation in one call:

from phileas.policy.policy import Policy
from phileas.services.filter_service import FilterService

policy = Policy.from_dict({
    "name": "contact-redaction",
    "identifiers": {
        "emailAddress": {"emailAddressFilterStrategies": [
            {"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]},
        "ssn": {"ssnFilterStrategies": [
            {"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]},
    },
})

result = FilterService().filter(
    policy=policy,
    context="support",
    document_id="doc-001",
    text="Contact john@example.com or call about SSN 123-45-6789.",
)
print(result.filtered_text)
# Contact {{{REDACTED-email-address}}} or call about SSN {{{REDACTED-ssn}}}.

The same policy runs unchanged on the Java and .NET builds of Phileas: the policy schema is shared across all three runtimes.

The same policy in PhiSQL

Instead of hand-writing the policy JSON, you can express it in PhiSQL , which compiles to the same schema and is version-controlled and reviewable like any other code:

POLICY contact_redaction;

REDACT EMAIL_ADDRESS WITH REDACT(format='{{{REDACTED-%t}}}');
REDACT SSN          WITH REDACT(format='{{{REDACTED-%t}}}');

Translating the rest of your Presidio setup

The common Presidio building blocks map cleanly:

Custom PatternRecognizer becomes a DEFINE IDENTIFIER statement (or a custom identifier entry in policy JSON). No Python class to subclass:
```
DEFINE IDENTIFIER 'MRN' MATCHING '\bMRN[\s:#]*\d{5,}\b' CASE INSENSITIVE
  WITH REDACT(format='{{{REDACTED-MRN}}}');
```
NLP-based recognizers (spaCy, Stanza, or transformer) become a DETECT PHEYE statement backed by PhEye PII/PHI-trained models:
```
DETECT PHEYE LABELS ('PERSON') WITH REDACT;
```
Anonymizer operators map to strategies: Presidio replace to a Phileas replace or static value, redact to REDACT, mask to MASK, hash to a hash strategy, encrypt to crypto-replace. Strategies Presidio does not have a built-in equivalent for include format-preserving encryption and date-shift:
```
REDACT SSN  WITH FPE_ENCRYPT;
REDACT DATE WITH SHIFT(days=30);
```
Conditional logic that would be custom Python around Presidio becomes a WHERE predicate. For example, keep only the last four digits of a credit card when detection confidence is high:
```
REDACT CREDIT_CARD WITH LAST_4 WHERE CONFIDENCE > 0.85;
```

A practical migration ports your recognizers and operators first to reach parity, then revisits the policy to use the conditional rules, consistent pseudonymization, and format-preserving strategies that were custom code (or simply unavailable) under Presidio.

Plan the migration with the team that built Philter

A 30-minute call with Jeff covers your current setup, the migration path that fits your stack, and where the gotchas usually live. No sales pitch.

Deploy Philter in 5 minutes

Contact Us