Sensitive data discovery scanner

Phinder

Phinder is a high-speed discovery scanner that crawls files and directories to map where sensitive information actually lives across your environment. It's the step that comes before redaction. You can't protect what you can't find.

View on GitHub

Star 1

Point Phinder at a directory. Find every PII entity.

$ phinder scan /data/patient-records/ \
    --policy healthcare.json --format json

[
  { "file": "intake-0942.pdf",   "SSN": 3, "NAME": 7, "DOB": 2 },
  { "file": "lab-report-1247.docx", "SSN": 1, "PHONE": 2, "MRN": 1 }
]

Documentation → Release Notes → GitHub →

Find PII across your files

Point Phinder at a directory and it walks every file, classifying the sensitive values inside each one so you know exactly where PII and PHI live before you redact.

$phinder scan /data/records

customers_2024.csv EMAIL×1,204PHONE×1,204SSN×3
support/transcripts.txt PERSON×88EMAIL×42CREDIT_CARD×2
hr/onboarding.docx PERSON×15SSN×15DATE×15
logs/app-2024-06.log IP_ADDRESS×9,321EMAIL×47
archive/README.md no PII found

Sensitive values found in 4 of 5 files. Every match maps to the same entity types Philter will redact.

Why Phinder

Built for scale

Designed for terabytes of unstructured storage. Parallel workers, streaming I/O, and bounded memory so a discovery job never takes down the host it's running on.

Filesystem crawler

Native crawling of local filesystems with support for many document formats. Same policy, same output format, no matter where in your directory tree the files live.

Shared policies with Philter

Define a policy once. Phinder uses it to discover; Philter uses it to redact. The entity types you found are the entity types you redact, with no drift between detection and action.

Audit-ready reports

JSON, CSV, or human-readable summaries. Inventory the entity types per file and per directory: exactly the artifacts auditors ask for.

Search-result redaction

Search Redact brings the same Phileas detection to OpenSearch and Elasticsearch, redacting sensitive information from search results before they leave the cluster. Same engine, different surface.

Compounds with the rest of the toolkit

Discovery without redaction is just inventory. Pair Phinder with Philter (to remediate what was found) and Phield (to keep watching what was missed) for a complete PII lifecycle.

Frequently asked questions

If something here isn’t covered, get in touch and we’ll answer.

What is Phinder?

Phinder is a high-speed discovery scanner. Point it at a directory or file share and it crawls the content, detects PII and PHI, and reports which entity types live in which files. It is read-only: the job is to map where sensitive data is, not to change it. Think of it as the inventory step that comes before redaction. You can't protect what you can't find.

How is Phinder different from Philter?

Phinder finds and inventories sensitive data; Philter redacts it. Phinder answers "where is the PII, and what kinds," while Philter acts on it. Because both read the same policy, the entity types Phinder discovers are exactly the entity types Philter will redact, with no drift between detection and remediation.

What storage and file types can Phinder scan?

Phinder crawls local filesystems and reads many document formats. The same policy and the same output format apply across every directory it scans, so reports stay consistent no matter where the files sit.

Does Phinder modify or move my data?

No. Phinder is read-only discovery. It reports the entity types it found per file and per directory, and it never redacts, rewrites, or copies your source data anywhere. Remediation is Philter's job; Phinder's job is to tell you what is there and where.

Is Phinder open source?

Yes. Phinder is open source under the permissive Apache License, version 2, and the code is on GitHub. Run it wherever your data lives, with no per-seat fees and no vendor lock-in.

Ready to use Phinder?

Grab the open source and run it yourself, or work with our team directly. Pick the path that fits.

See your options