Before and after: the same redaction in code
Because Presidio is a library, the most direct migration target is Phileas, the embeddable engine under Philter. Here is the same job (detect an email and an SSN, then de-identify both) in Presidio and in Phileas, side by side.
Presidio (Python)
In Presidio you call the analyzer, then build an operator config for the anonymizer:
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
text = "Contact john@example.com or call about SSN 123-45-6789."
results = analyzer.analyze(text=text, language="en")
anonymized = anonymizer.anonymize(
text=text,
analyzer_results=results,
operators={
"EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "<EMAIL>"}),
"US_SSN": OperatorConfig("replace", {"new_value": "<SSN>"}),
},
)
print(anonymized.text)
# Contact <EMAIL> or call about SSN <SSN>.
Phileas (Python)
In Phileas the same rules live in a policy, and the engine handles detection and manipulation in one call:
from phileas.policy.policy import Policy
from phileas.services.filter_service import FilterService
policy = Policy.from_dict({
"name": "contact-redaction",
"identifiers": {
"emailAddress": {"emailAddressFilterStrategies": [
{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]},
"ssn": {"ssnFilterStrategies": [
{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]},
},
})
result = FilterService().filter(
policy=policy,
context="support",
document_id="doc-001",
text="Contact john@example.com or call about SSN 123-45-6789.",
)
print(result.filtered_text)
# Contact {{{REDACTED-email-address}}} or call about SSN {{{REDACTED-ssn}}}.
The same policy runs unchanged on the Java and .NET builds of Phileas: the policy schema is shared across all three runtimes.
The same policy in PhiSQL
Instead of hand-writing the policy JSON, you can express it in PhiSQL, which compiles to the same schema and is version-controlled and reviewable like any other code:
POLICY contact_redaction;
REDACT EMAIL_ADDRESS WITH REDACT(format='{{{REDACTED-%t}}}');
REDACT SSN WITH REDACT(format='{{{REDACTED-%t}}}');
Translating the rest of your Presidio setup
The common Presidio building blocks map cleanly:
Custom
PatternRecognizerbecomes aDEFINE IDENTIFIERstatement (or a custom identifier entry in policy JSON). No Python class to subclass:DEFINE IDENTIFIER 'MRN' MATCHING '\bMRN[\s:#]*\d{5,}\b' CASE INSENSITIVE WITH REDACT(format='{{{REDACTED-MRN}}}');NLP-based recognizers (spaCy, Stanza, or transformer) become a
DETECT PHEYEstatement backed by PhEye PII/PHI-trained models:DETECT PHEYE LABELS ('PERSON') WITH REDACT;Anonymizer operators map to strategies: Presidio
replaceto a Phileas replace or static value,redacttoREDACT,masktoMASK,hashto a hash strategy,encryptto crypto-replace. Strategies Presidio does not have a built-in equivalent for include format-preserving encryption and date-shift:REDACT SSN WITH FPE_ENCRYPT; REDACT DATE WITH SHIFT(days=30);Conditional logic that would be custom Python around Presidio becomes a
WHEREpredicate. For example, keep only the last four digits of a credit card when detection confidence is high:REDACT CREDIT_CARD WITH LAST_4 WHERE CONFIDENCE > 0.85;
A practical migration ports your recognizers and operators first to reach parity, then revisits the policy to use the conditional rules, consistent pseudonymization, and format-preserving strategies that were custom code (or simply unavailable) under Presidio.