Case Studies

Redaction in Production

How organizations use the Philterd toolkit to solve real PII and PHI redaction problems.

Legal · Philter

Bankruptcy Filing Redaction for a Law Firm

Challenge

The firm handled federal bankruptcy filings and had to strip PII from Microsoft Word documents to comply with <a href="https://www.law.cornell.edu/rules/frcp/rule_9037" target="_blank" rel="noopener noreferrer">Federal Rule of Bankruptcy Procedure 9037</a>: SSNs and TINs reduced to the last four digits, birthdates to the year, financial account numbers to the last four, and minors' names to initials. The firm's IT was a small outsourced team mid-migration to AWS, with no one on staff to design or operate a redaction pipeline.

Solution

We did the build. We annotated the firm's sample documents, configured Philter's SSN/TIN and identifier filters for the account-number formats, and used the NER filter to find names and reduce minors' to initials. We stood up the AWS resources to host Philter with encryption at rest and in transit, so the deployment served both on-premises and cloud workloads and kick-started the firm's broader cloud move. A watched-folder integration monitored a Windows shared drive and redacted each document automatically as staff saved it.

Result

On the firm's annotated sample, Philter identified 100% of the SSNs, TINs, financial account numbers, and birthdates present, with minors' names caught by both a dictionary and the NER filter. Redaction is invisible to end users: staff save a file and the redacted version is produced automatically. The firm owns the deployment outright, with no in-house engineers required to run it.

Healthcare · Phileas · PhEye

Multilingual Patient Chatbot

Challenge

The organization operated a patient-facing chatbot that triaged symptoms and routed conversations to clinical staff. Patients routinely typed Social Security numbers, dates of birth, medication names tied to specific conditions, and insurance member IDs into the chat window. The system needed to handle both English and French input with equal accuracy, and redaction had to happen in real time before messages were persisted or forwarded to a human agent.

Solution

Phileas was embedded directly into the chatbot's message-processing layer. Pattern-based filters handled structured identifiers (SSNs, phone numbers, dates of birth, insurance IDs) in both languages, while PhEye's NLP models detected unstructured PHI such as names, addresses, and clinical references that patterns alone would miss. Language detection routed each message to the appropriate model. The entire stack ran inside the organization's cloud with no data leaving their perimeter.

Result

Redaction runs inline with sub-100ms latency per message. Both English and French inputs are handled without language-specific routing from the end user. Chat transcripts stored for analytics and quality assurance contain no recoverable PHI, satisfying the organization's HIPAA and privacy obligations.

Healthcare · Philter

EHR-to-Database Data Pipeline

Challenge

Clinical notes, discharge summaries, and radiology reports were extracted from an EHR and streamed through an AWS data pipeline into a database used by research and analytics teams. The narrative text contained dense PHI: patient names, physician names, dates, facilities, and medical record numbers embedded in free-form prose. The organization needed to de-identify this text before it reached the analytics database so downstream consumers could work with the data without HIPAA restrictions.

Solution

Philter was deployed as a service within the AWS pipeline. As documents flowed from the EHR extract into the pipeline, each record's narrative fields were sent to Philter for redaction before being written to the analytics database. Philter's combination of pattern matching and NLP handled both the structured identifiers (MRNs, dates, phone numbers) and the unstructured names and locations that appear unpredictably in clinical prose. The deployment ran entirely within the organization's VPC with no data leaving their AWS environment.

Result

The analytics database receives fully de-identified text. Research and analytics teams query the data without individual HIPAA access controls on every row. The pipeline processes thousands of documents daily with redaction adding minimal latency to the overall flow.

Have a similar problem?

Tell us about your data pipeline, your compliance requirements, and where PII is getting through. We'll show you how to stop it.