Library comparison

Phileas vs Microsoft Presidio

Both Phileas and Microsoft Presidio are open source, self-hosted libraries you embed in your own application to find and de-identify PII and PHI in text. Neither is a managed service, and with both the data stays inside your process, so “your data stays local” is not what separates them. It is the wedge against SaaS redaction APIs, not against each other.

Presidio is the open source PII toolkit most developers find first, so if you are choosing a library to build on, it is usually the thing Phileas gets measured against. This page is an honest, library-to-library comparison from the team that builds Phileas. If you are instead weighing a turnkey, supported redaction service rather than a library to assemble, read Philter vs Microsoft Presidio , which compares the productized engine.

Where Presidio is genuinely strong

It is worth stating Presidio’s strengths plainly, because they are real and a few of them are areas where Presidio is ahead of Phileas today:

Much larger community and brand. Microsoft-backed, MIT-licensed, pip install presidio-analyzer and go, with a wide set of recognizers out of the box.
Image, OCR, and DICOM de-identification. The presidio-image-redactor package redacts PII burned into images and supports DICOM medical-imaging de-identification. Phileas focuses on text and PDF, so for image or medical-imaging work, Presidio is ahead.
Structured and semi-structured data. presidio-structured targets tabular and JSON data. Phileas is built around free text.
Very easy Python extensibility. Writing a custom recognizer is a small Python class, and there is a clear path to plug in transformer- or LLM-based recognizers alongside the default spaCy or Stanza models.

If your problem is image or DICOM de-identification, tabular data, or a Python-only prototype you want running this afternoon, Presidio is a perfectly good choice, and the two projects share the most important property: auditable, open source redaction that never ships your data to a vendor.

Side by side

	Phileas	Microsoft Presidio
License	Apache 2.0 · open source	MIT · open source
Project type	Production engine behind Philter, with a commercial support path	Microsoft community project (community support only)
Language runtimes	Native Java, Python, and .NET on one shared policy schema	Python-first (REST-callable, but a single runtime)
Policy model	PhiSQL, a declarative policy language that compiles to a versioned, reviewable JSON schema	Recognizers and operators configured in code or YAML; no declarative policy language or compiled policy artifact
Detection models	PhEye PII/PHI-trained lenses (general, healthcare, more) with Apache OpenNLP heritage	General-purpose spaCy, Stanza, or transformer NER; bring or tune your own for domain accuracy
Conditional redaction	First-class: gate on attributes and content (zip population, age thresholds, detection confidence, patterns)	Recognizers detect entities; attribute-conditioned redaction is not first-class
Redaction strategies	Mask, redact, hash, AES-GCM crypto, FF3 format-preserving encryption, date-shift, consistent pseudonymization, synthetic replace	replace, redact, mask, hash (salted hash gives deterministic tokens), encrypt (AES), custom; no format-preserving encryption or date-shift
PDF redaction	Built in, rasterized (no recoverable text under the redaction)	Via the image redactor; PDF handling is more bespoke
Image / OCR / DICOM	Not a focus today (text and PDF)	Yes · `presidio-image-redactor` and DICOM de-identification
Structured / tabular	Free-text focus	Yes · `presidio-structured`
Surrounding toolkit	Scope (precision/recall), Arbiter (review), Phinder (discovery), Phield (monitoring), AI Proxy and MCP (LLM traffic)	Standalone detect-and-anonymize
Commercial path	Available via Philter (managed deploys, support) without closing the source	Community only

We want this comparison to be accurate and fair. Open source projects move fast, so this reflects publicly documented behavior at the time of writing and may have changed since. Always verify against the current Presidio documentation before deciding, and if you spot anything inaccurate, please let us know and we will correct it.

Where Phileas differentiates

The honest framing is not “Phileas detects better.” Both are good detectors, and Presidio is ahead on image and structured data. Phileas pulls away on productization, the policy model, and the manipulation side, which is where teams moving a prototype into a governed production pipeline tend to hit walls with Presidio.

A governed policy model, with PhiSQL. Phileas redaction is driven by a formal, versioned JSON policy schema, and PhiSQL is a declarative language that compiles to it. Policies are version-controlled, reviewable, and testable like code, and you can hand one to a compliance owner. Presidio is configured in code or YAML; there is no equivalent policy artifact or DSL.
Conditional redaction. Phileas can redact based on attributes and content: only zip codes with population under a threshold, only ages over a cutoff, only IPs matching a pattern, gated on detection confidence, and so on. In Presidio, this kind of attribute-conditioned logic is not first-class.
Richer, reversible manipulation. Beyond replace, redact, mask, hash, and encrypt, Phileas does consistent pseudonymization (the same person maps to the same token across a document or context), FF3 format-preserving encryption, AES-GCM crypto-replace, and date shifting. Presidio’s salted hash operator can produce deterministic tokens, which is a fair overlap, but format-preserving encryption and date-shift are not built in. The consistency and reversibility piece is what AI training data and analytics need.
One schema across Java, Python, and .NET. Phileas runs natively on all three runtimes against the same policy schema. Presidio is Python-only (REST-callable, but one runtime). For JVM and .NET shops, Phileas is native rather than a service hop.
Purpose-built models. Detection runs on PhEye PII/PHI-trained models and a swappable lens catalog , with Apache OpenNLP heritage behind it. Presidio defaults to general-purpose NER and expects you to bring or tune models for domain accuracy.
A lifecycle, not just a library. Phileas sits in a toolkit: Philter Scope scores policies on precision and recall against gold-standard data, Arbiter does human review, Phinder does discovery, Phield does monitoring, and the AI Proxy and MCP server cover LLM traffic. Presidio is standalone.

One-line positioning

Presidio is a flexible Python toolkit for building PII detection. Phileas is a production redaction engine: governed policies you can version and review, reversible and consistent pseudonymization, native Java, Python, and .NET, and a full discover, redact, review, measure, and monitor toolkit around it.

Moving from Presidio

If you are already on Presidio, the migrate from Presidio guide maps Presidio recognizers to Phileas filters and Presidio anonymizer operators to Phileas strategies and PhiSQL, with before-and-after code.

Explore Phileas Phileas on GitHub Migrate from Presidio

Philter, Phileas, or Presidio?

Two of these are Philterd and one is the alternative. They are not really rivals so much as three different shapes. The quick way to choose:

Philter

A turnkey, self-hosted redaction API other systems call over HTTP, shipping with NLP models, cloud-marketplace deploys, and a commercial-support path.

Choose when: you want a running service to point pipelines at, not a library to build into one.

Phileas

The open source library underneath Philter. Embed it in a Java, Python, or .NET application and redact in-process, governed by a versioned policy and PhiSQL.

Choose when: you want redaction inside your own process with no extra service to run. This is the closest like-for-like with Presidio.

Microsoft Presidio

Microsoft's open source, Python-first PII toolkit. You assemble the analyzer and anonymizer yourself, with spaCy or transformer recognizers.

Choose when: your stack is Python-only and you are happy to operate it yourself, or you need its image, DICOM, or structured-data coverage.

Run the same workload through Philter

Deploy from your cloud marketplace in 5 minutes, or get a 30-minute architecture review with Jeff. He'll walk through your stack and the comparison decision honestly. No sales pitch.

Deploy Philter in 5 minutes

Contact Us