Talk to the Team

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer email? support@philterd.ai

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

← All posts

Multilingual Medical PII Redaction: English and French with Philter

Healthcare data is not only in English. Clinical notes, intake forms, and patient correspondence in Canada flow in both English and French, and organizations operating under PIPEDA or Quebec’s Law 25 have to de-identify both. An English-only redaction pipeline leaves PHI exposed in every French document it touches.

Philter handles this through its policy system: one running instance, one API endpoint, and a separate policy file for each language. The policy you apply to a given request determines which entity detectors fire and which locale-specific identifiers are recognized. There is no need to run separate instances or route text through different services.

This post walks through a working demo that redacts English and French clinical text, covering persons, medical conditions, and locale-aware identifiers.

The demo

The philterd/philter-multilingual-medical repository ships two policy files and a docker-compose.yaml that brings up Philter along with the Ph.Eye NLP models it needs. Clone the repo and run:

docker-compose up

Once the containers are ready, you can send text to Philter and select a policy per request using the p query parameter.

The English policy

The english policy activates three detectors:

  • Persons via a Ph.Eye model trained on English names
  • Medical conditions (diseases and disorders) via a Ph.Eye medical-condition model
  • US Social Security Numbers via Philter’s built-in SSN pattern matcher

A request using this policy:

curl "http://localhost:8080/api/filter?p=english" \
  --data "George Washington was president and his ssn was 123-45-6789 with diabetes and high blood pressure." \
  -H "Content-type: text/plain"

Response:

{{{REDACTED-person}}} was president and his ssn was {{{REDACTED-ssn}}} with {{{REDACTED-medical-condition}}} and {{{REDACTED-medical-condition}}}.

The person name, the SSN, and both medical conditions are redacted. Non-sensitive text passes through unchanged.

The French policy

The french policy swaps in French-language detectors:

  • Persons via a Ph.Eye model trained on French names
  • Medical conditions labeled with the French term Maladie (disease)
  • Canadian identifier via a custom regex pattern (\d{3}-\d{3}-\d{3}) that matches the Canadian format rather than a US SSN

A French request:

curl "http://localhost:8080/api/filter?p=french" \
  --data "Je m'appelle Mary et je suis diabétique." \
  -H "Content-type: text/plain"

Response:

Je m'appelle {{{REDACTED-person}}} et je suis {{{REDACTED-medical-condition}}}.

The same engine, the same API endpoint, a different policy. The locale-specific identifier behavior is visible when the input contains a Canadian ID number. Under the English policy, 123-45-6789 (US SSN format) is redacted as {{{REDACTED-ssn}}}. Under the French policy, 123-456-789 (Canadian format) is redacted as {{{REDACTED-canadian-id}}}.

A note on the French models

The French policy currently uses the same underlying NLP models as the English policy for some entity types. French-specific model performance will reflect this until dedicated French models are integrated. For production use in French-dominant environments, validate the French policy against a representative sample of your actual data to understand precision and recall before deploying.

How the policy selection works

The p query parameter tells Philter which policy file to load for that request. The policy files live in the policies/ directory of your Philter deployment. Each file is a JSON document that declares which detectors to enable and how each one should behave, including the redaction format string and any custom identifier patterns.

Switching the active policy per request means you can handle mixed-language document sets through a single Philter instance. A document classification step upstream can determine the language and set p accordingly.

Running on your own data

The demo is a starting point. To adapt it:

  1. Clone the repo and run docker-compose up to verify the baseline works.
  2. Open policies/english.json and policies/french.json to see the full detector configuration.
  3. Use the Redaction Policy Editor to build or modify policies without editing JSON by hand.
  4. Send sample documents from your own corpus and review the output before widening to production volume.

Philter runs entirely inside your environment. Text never leaves the container. That applies equally to both language policies, which matters for health data governed by Law 25’s strict residency requirements.

Deploy

The demo uses docker-compose for local testing. For production, Philter is available on the AWS Marketplace, the Google Cloud Marketplace, and the Microsoft Azure Marketplace, or you can run it on any host that supports Docker.

For organizations subject to PIPEDA or Quebec’s Law 25, see the PIPEDA and Law 25 compliance guide for data residency requirements, Canadian identifier handling, and deployment guidance.

If you need multilingual PII or PHI redaction in your clinical or health-data pipeline, get in touch. We can help you tune the language policies for your specific document types and compliance requirements.