Talk to an Expert

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer to skip the form? Pick a time on our calendar →
or send a message

← All policies

Healthcare · Philterd

Clinical Notes De-Identification (Date-Shifted)

De-identify clinical notes for research, ML training, or analytics — preserving temporal relationships via per-patient date shifting.

v1.0.0 Updated 2026-05-18 Philter >=3.0.0 By Philterd
HIPAAPHIclinical notesdate shiftingresearch

The policy

The full clinical-notes-deid.json file — the same content you’d get by downloading. Copy any part of it, or use the buttons in the hero to grab the whole file.

{
  "name": "clinical-notes-deid",
  "config": {
    "splitting": {
      "enabled": true,
      "threshold": 8000
    }
  },
  "ignored": [],
  "identifiers": {
    "personsName": {
      "personsFilterStrategies": [
        {"strategy": "REPLACE", "redactionFormat": "[PATIENT]", "conditions": "confidence > 60"}
      ]
    },
    "physicianName": {
      "physicianNameFilterStrategies": [
        {"strategy": "REPLACE", "redactionFormat": "[PROVIDER]"}
      ]
    },
    "date": {
      "onlyValidDates": true,
      "dateFilterStrategies": [
        {"strategy": "SHIFT", "shiftDays": "RANDOM(-90,90)"}
      ]
    },
    "age": {
      "ageFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[AGE>89]", "conditions": "context == \"age\" > 89"}
      ]
    },
    "phoneNumber": {
      "phoneNumberFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[PHONE]"}
      ]
    },
    "ssn": {
      "ssnFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[SSN]"}
      ]
    },
    "emailAddress": {
      "emailAddressFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[EMAIL]"}
      ]
    },
    "hospital": {
      "hospitalFilterStrategies": [
        {"strategy": "REPLACE", "redactionFormat": "[FACILITY]"}
      ]
    },
    "identifiers": [
      {
        "id": "mrn",
        "pattern": "\\bMRN[\\s:#]*\\d{5,}\\b",
        "caseSensitive": false,
        "identifierFilterStrategies": [
          {"strategy": "REDACT", "redactionFormat": "[MRN]"}
        ]
      }
    ]
  }
}

Example

Input

Dr. Garcia saw Mr. Smith (MRN 47291) on 2025-03-14 for follow-up of his 2024-12-08 admission. Patient is 72 years old.

Output

[PROVIDER] saw [PATIENT] ([MRN]) on 2025-04-26 for follow-up of his 2025-01-19 admission. Patient is 72 years old.

Entities this policy acts on

NAMEPROVIDERDATEAGEPHONESSNEMAILHOSPITALMRN

What this policy does

Tailored for research and ML use cases where you need patient privacy and the temporal structure of clinical events to remain analyzable. Differs from the strict HIPAA Safe Harbor policy in three ways:

  1. Per-patient date shifting instead of full date redaction. Each patient’s dates are shifted by the same random offset (±90 days), preserving intervals between events while breaking linkage to the actual calendar dates.
  2. Replacement tokens instead of full redaction[PATIENT], [PROVIDER], [FACILITY] — makes the text more readable for human reviewers and easier for downstream NLP.
  3. Confidence-gated name detection (confidence > 60) reduces over-redaction of common English words that Philter’s NER occasionally misfires on in clinical text.

Ages under 90 are preserved (clinically relevant). Ages 90+ get the [AGE>89] token per HIPAA Safe Harbor §164.514(b)(2)(i)(C).

When to customize

  • Date shift window. ±90 days works for most cohort studies. For oncology timelines or longitudinal studies that span years, widen the window to ±365 or more.
  • Replacement tokens. If your downstream pipeline expects specific tokens (e.g., spaCy’s PERSON or HuggingFace’s [NAME]), edit the redactionFormat fields.
  • Confidence threshold. > 60 is conservative. For higher-recall (catch more names at the cost of more false positives), lower to > 40. For higher-precision research datasets where false positives are worse than false negatives, raise to > 80.
  • MRN regex. Same caveat as the Safe Harbor policy — adjust for your EHR’s format.

When NOT to use this policy

  • For publication or sharing outside your covered entity. This policy is more permissive than Safe Harbor — it keeps year-month-day structure (just shifted) and uses semantic tokens that are easier to re-identify than full redaction. For external sharing, use hipaa-safe-harbor.json instead.
  • For PCI or financial workloads. Use the finance/ policies instead.

Compliance notes

Date shifting is a recognized de-identification technique under the HHS Expert Determination method, but it does NOT meet HIPAA Safe Harbor on its own. Sharing data redacted with this policy outside the covered entity requires a qualified statistician’s certification under 45 CFR 164.514(b)(1).

For internal research use within the covered entity, this policy + an executed Business Associate Agreement (where applicable) is generally sufficient. Confirm with your IRB or compliance office.

References

Use this policy

Download and load into your running Philter instance:

# Download the policy
curl -O https://raw.githubusercontent.com/philterd/pii-redaction-policies/main/policies/philterd/healthcare/clinical-notes-deid.json

# Upload to your Philter instance
curl -X POST http://localhost:8080/api/policies \
     -H "Content-Type: application/json" \
     --data @clinical-notes-deid.json

# Redact text using the policy
curl http://localhost:8080/api/filter?p=clinical-notes-deid \
     --data "your text here" \
     -H "Content-Type: text/plain"

No Philter instance yet? Deploy one in 5 minutes → · Want to tune this policy against your data? Talk to the team.