Both Phileas and Microsoft Presidio are open source, self-hosted libraries you embed in your own application to find and de-identify PII and PHI in text. Neither is a managed service, and with both the data stays inside your process, so “your data stays local” is not what separates them. It is the wedge against SaaS redaction APIs, not against each other.
Presidio is the open source PII toolkit most developers find first, so if you are choosing a library to build on, it is usually the thing Phileas gets measured against. This page is an honest, library-to-library comparison from the team that builds Phileas. If you are instead weighing a turnkey, supported redaction service rather than a library to assemble, read Philter vs Microsoft Presidio, which compares the productized engine.
Where Presidio is genuinely strong
It is worth stating Presidio’s strengths plainly, because they are real and a few of them are areas where Presidio is ahead of Phileas today:
- Much larger community and brand. Microsoft-backed, MIT-licensed,
pip install presidio-analyzerand go, with a wide set of recognizers out of the box. - Image, OCR, and DICOM de-identification. The
presidio-image-redactorpackage redacts PII burned into images and supports DICOM medical-imaging de-identification. Phileas focuses on text and PDF, so for image or medical-imaging work, Presidio is ahead. - Structured and semi-structured data.
presidio-structuredtargets tabular and JSON data. Phileas is built around free text. - Very easy Python extensibility. Writing a custom recognizer is a small Python class, and there is a clear path to plug in transformer- or LLM-based recognizers alongside the default spaCy or Stanza models.
If your problem is image or DICOM de-identification, tabular data, or a Python-only prototype you want running this afternoon, Presidio is a perfectly good choice, and the two projects share the most important property: auditable, open source redaction that never ships your data to a vendor.
Side by side
| Phileas | Microsoft Presidio | |
|---|---|---|
| License | Apache 2.0 · open source | MIT · open source |
| Project type | Production engine behind Philter, with a commercial support path | Microsoft community project (community support only) |
| Language runtimes | Native Java, Python, and .NET on one shared policy schema | Python-first (REST-callable, but a single runtime) |
| Policy model | PhiSQL, a declarative policy language that compiles to a versioned, reviewable JSON schema | Recognizers and operators configured in code or YAML; no declarative policy language or compiled policy artifact |
| Detection models | PhEye PII/PHI-trained lenses (general, healthcare, more) with Apache OpenNLP heritage | General-purpose spaCy, Stanza, or transformer NER; bring or tune your own for domain accuracy |
| Conditional redaction | First-class: gate on attributes and content (zip population, age thresholds, detection confidence, patterns) | Recognizers detect entities; attribute-conditioned redaction is not first-class |
| Redaction strategies | Mask, redact, hash, AES-GCM crypto, FF3 format-preserving encryption, date-shift, consistent pseudonymization, synthetic replace | replace, redact, mask, hash (salted hash gives deterministic tokens), encrypt (AES), custom; no format-preserving encryption or date-shift |
| PDF redaction | Built in, rasterized (no recoverable text under the redaction) | Via the image redactor; PDF handling is more bespoke |
| Image / OCR / DICOM | Not a focus today (text and PDF) | Yes · presidio-image-redactor and DICOM de-identification |
| Structured / tabular | Free-text focus | Yes · presidio-structured |
| Surrounding toolkit | Scope (precision/recall), Arbiter (review), Phinder (discovery), Phield (monitoring), AI Proxy and MCP (LLM traffic) | Standalone detect-and-anonymize |
| Commercial path | Available via Philter (managed deploys, support) without closing the source | Community only |
We want this comparison to be accurate and fair. Open source projects move fast, so this reflects publicly documented behavior at the time of writing and may have changed since. Always verify against the current Presidio documentation before deciding, and if you spot anything inaccurate, please let us know and we will correct it.
Where Phileas differentiates
The honest framing is not “Phileas detects better.” Both are good detectors, and Presidio is ahead on image and structured data. Phileas pulls away on productization, the policy model, and the manipulation side, which is where teams moving a prototype into a governed production pipeline tend to hit walls with Presidio.
A governed policy model, with PhiSQL. Phileas redaction is driven by a formal, versioned JSON policy schema, and PhiSQL is a declarative language that compiles to it. Policies are version-controlled, reviewable, and testable like code, and you can hand one to a compliance owner. Presidio is configured in code or YAML; there is no equivalent policy artifact or DSL.
Conditional redaction. Phileas can redact based on attributes and content: only zip codes with population under a threshold, only ages over a cutoff, only IPs matching a pattern, gated on detection confidence, and so on. In Presidio, this kind of attribute-conditioned logic is not first-class.
Richer, reversible manipulation. Beyond replace, redact, mask, hash, and encrypt, Phileas does consistent pseudonymization (the same person maps to the same token across a document or context), FF3 format-preserving encryption, AES-GCM crypto-replace, and date shifting. Presidio’s salted
hashoperator can produce deterministic tokens, which is a fair overlap, but format-preserving encryption and date-shift are not built in. The consistency and reversibility piece is what AI training data and analytics need.One schema across Java, Python, and .NET. Phileas runs natively on all three runtimes against the same policy schema. Presidio is Python-only (REST-callable, but one runtime). For JVM and .NET shops, Phileas is native rather than a service hop.
Purpose-built models. Detection runs on PhEye PII/PHI-trained models and a swappable lens catalog, with Apache OpenNLP heritage behind it. Presidio defaults to general-purpose NER and expects you to bring or tune models for domain accuracy.
A lifecycle, not just a library. Phileas sits in a toolkit: Philter Scope scores policies on precision and recall against gold-standard data, Arbiter does human review, Phinder does discovery, Phield does monitoring, and the AI Proxy and MCP server cover LLM traffic. Presidio is standalone.
One-line positioning
Presidio is a flexible Python toolkit for building PII detection. Phileas is a production redaction engine: governed policies you can version and review, reversible and consistent anonymization, native Java, Python, and .NET, and a full discover, redact, review, measure, and monitor toolkit around it.
Moving from Presidio
If you are already on Presidio, the migrate from Presidio guide maps Presidio recognizers to Phileas filters and Presidio anonymizer operators to Phileas strategies and PhiSQL, with before-and-after code.
Philter, Phileas, or Presidio?
Two of these are Philterd and one is the alternative. They are not really rivals so much as three different shapes. The quick way to choose:
Philter
A turnkey, self-hosted redaction API other systems call over HTTP, shipping with NLP models, cloud-marketplace deploys, and a commercial-support path.
Choose when: you want a running service to point pipelines at, not a library to build into one.
Phileas
The open source library underneath Philter. Embed it in a Java, Python, or .NET application and redact in-process, governed by a versioned policy and PhiSQL.
Choose when: you want redaction inside your own process with no extra service to run. This is the closest like-for-like with Presidio.
Microsoft Presidio
Microsoft's open source, Python-first PII toolkit. You assemble the analyzer and anonymizer yourself, with spaCy or transformer recognizers.
Choose when: your stack is Python-only and you are happy to operate it yourself, or you need its image, DICOM, or structured-data coverage.
Further reading
- Migrate from Presidio to Phileas: concept mapping, recognizer translation, and before-and-after code.
- Philter vs Microsoft Presidio: the same comparison at the product level, if you want a turnkey supported service rather than an embedded library.
- PhiSQL: the declarative policy language that replaces hand-wiring recognizers and operators.
- PhEye lens catalog: the purpose-trained PII and PHI models behind Phileas detection.