Readable by humans
REDACT SSN WITH MASK says exactly what it does. Policies become something a compliance reviewer can read in a pull request, not a wall of nested JSON.
The query language for PII operations
PhiSQL is a declarative query language for PII privacy operations across the Philterd toolkit. Write a few readable lines instead of hand-editing JSON, and PhiSQL compiles them to the same Phileas policy schema that Philter and Phileas already run. One language for your PII policies, version-controlled and reviewable like any other code. Read the launch announcement for the full story.
REDACT SSN WITH MASK says exactly what it does. Policies become something a compliance reviewer can read in a pull request, not a wall of nested JSON.
The same policy drives detection and redaction in Philter and Phileas. Author once in PhiSQL; run everywhere the JSON policy runs.
Express HIPAA Safe Harbor, PCI DSS scope reduction, or court-filing redaction as a short, auditable policy. The shipped examples mirror the rules auditors cite.
.phisql files diff cleanly in Git. Policy changes go through the same review and CI pipeline as the rest of your code.
PhiSQL compiles to the Phileas JSON your stack already executes. Adopt it for authoring without changing anything downstream.
The specification, grammar, the Java, Python, and .NET reference implementations, and examples are all open source under the permissive Apache 2.0 license.
PhiSQL is defined as an open specification, proven by independent reference implementations in three languages. The specification, the implementations, and the worked examples all live in one Apache-2.0 repository.
Versioned under spec/v1.0/: an ANTLR4 grammar and EBNF, a catalog of entity types, strategies, keywords, and predicates, plus worked examples that pair each .phisql file with the JSON it compiles to.
Independent parsers and compilers in Java, Python, and .NET that all emit the same Phileas JSON for the same input. They share one catalog as the single source of truth, so none can drift from the spec or from each other.
PhiSQL never adds capabilities the JSON schema does not already have. Anything you express in PhiSQL maps cleanly to a Phileas policy, so there is no lock-in and no second source of truth.
Three independent parsers and compilers, one per language, all emitting the same Phileas JSON for the same PhiSQL input. Pick the one that matches your stack.
The original reference implementation, available from Maven Central as a Java library you can build locally or add as a dependency.
View on GitHub →A Python port that mirrors the specification grammar rule for rule and compiles to the same Phileas JSON as the others.
View on GitHub →A C# port targeting .NET, shipping a library and a command-line compiler that produce the same Phileas JSON as the others.
View on GitHub →PhiSQL is an authoring layer, not a new runtime. Your .phisql source compiles to a standard Phileas JSON policy, which Philter, Phileas, and the rest of the toolkit already execute. The redaction policy JSON schema stays the source of truth: Phileas JSON leads, PhiSQL follows.
PhiSQL covers redaction (REDACT, DEIDENTIFY, IGNORE), custom regex identifiers (DEFINE IDENTIFIER), AI/NER detection (DETECT PHEYE), and date shifting (SHIFT). Each query below is a complete, working policy drawn from the specification's worked examples.
Every example compiles to a standard Phileas JSON policy. The same rules you would otherwise hand-write in JSON, expressed in a few readable lines.
-- Minimal example: redact U.S. Social Security Numbers.
POLICY ssn_only;
REDACT SSN WITH MASK;
-- HIPAA Safe Harbor de-identification (45 CFR 164.514(b)(2)).
POLICY hipaa_safe_harbor
DESCRIPTION 'HIPAA Safe Harbor de-identification.';
DEIDENTIFY
PHYSICIAN_NAME AS RANDOM_REPLACE,
HOSPITAL AS RANDOM_REPLACE,
DATE AS TRUNCATE,
AGE AS REDACT,
SSN AS REDACT,
PHONE_NUMBER AS REDACT,
EMAIL_ADDRESS AS REDACT,
STREET_ADDRESS AS REDACT,
CITY AS REDACT,
STATE AS REDACT,
ZIP_CODE AS REDACT;
-- PCI DSS v4.0 Req 3.2-3.4: PAN to last 4 only.
-- A WHERE predicate gates the rule on detection confidence.
POLICY pci_dss_scope_reduction
DESCRIPTION 'PCI DSS v4.0 scope reduction.';
REDACT CREDIT_CARD WITH LAST_4 WHERE CONFIDENCE > 0.85;
-- Customer support tickets, with an allowlist for company names.
POLICY support_tickets
DESCRIPTION 'Customer support ticket redaction with allowlist.';
REDACT FIRST_NAME, SURNAME WITH STATIC_REPLACE(value='Customer', scope=document);
REDACT EMAIL_ADDRESS WITH MASK;
REDACT PHONE_NUMBER WITH MASK;
IGNORE TERMS ('Acme', 'AcmeCorp') FOR FIRST_NAME;
IGNORE TERMS ('Corp', 'Support', 'Engineering') FOR SURNAME;
-- Format-preserving encryption keeps the surface format of an
-- identifier while making the value cryptographically opaque.
POLICY fpe_ssn;
REDACT SSN WITH FPE_ENCRYPT;
-- Define a custom regex identifier and redact what it matches,
-- here a medical record number like "MRN: 12345".
POLICY custom_identifier;
DEFINE IDENTIFIER 'MRN' MATCHING '\bMRN[\s:#]*\d{5,}\b' CASE INSENSITIVE
WITH REDACT(format='{{{REDACTED-MRN}}}');
-- Detect person names with the PhEye AI/NER model and redact them.
-- Add ENDPOINT '<url>' to point at a specific PhEye service.
POLICY person_detection;
DETECT PHEYE LABELS ('PERSON') WITH REDACT;
-- Shift detected dates by a fixed offset (a date-only strategy).
-- Use SHIFT(random=TRUE) for a random offset instead.
POLICY date_shift;
REDACT DATE WITH SHIFT(days=30);
Today PhiSQL handles policy authoring: redaction, custom identifiers, AI detection, and date shifting. Later versions extend the same language to the rest of the toolkit: discovery, monitoring, and benchmarking. The syntax below illustrates that direction and is not yet implemented.
-- Discovery (planned): inventory where PII lives.
FIND PII IN 's3://patient-records/' WHERE CONFIDENCE > 0.8;
-- Benchmarking (planned): score a policy on precision and recall.
BENCHMARK POLICY hipaa_safe_harbor AGAINST 'gold-standard/';
-- Monitoring (planned): alert on unexpected PII flow.
MONITOR PII ON 'kafka://topic/events' ALERT WHEN VOLUME > 1000;
Grammar and semantics for these statements are still being designed in the open. Follow the repository and its RFCs to weigh in.
If something here isn’t covered, get in touch and we’ll answer.
REDACT SSN WITH MASK that say what should happen, in readable keyword-and-clause form, the same way a SQL query declares the result you want rather than the steps to get there. The "query language for PII operations" framing is deliberate: today PhiSQL authors redaction policies, and the roadmap extends the same language to querying for PII (a planned FIND PII IN ... discovery statement), along with monitoring and benchmarking. What PhiSQL is not is a SQL dialect: it does not connect to or query relational databases, and it is not a superset or extension of SQL. It has its own small grammar and compiles to a Phileas JSON policy, so if you have written SQL the style feels familiar, but there is no SQL underneath.ai.philterd:phisql), Python, and .NET.POLICY declarations; REDACT and DEIDENTIFY across entity types and strategies (including date shifting with SHIFT and TRUNCATE_TO_YEAR); WHERE predicates such as confidence thresholds; IGNORE clauses for allowlisted terms and patterns; custom regex identifiers via DEFINE IDENTIFIER ... MATCHING; and AI/NER detection (for example, person names) via DETECT PHEYE.mvn verify in reference/java/ or add ai.philterd:phisql as a Maven dependency. The Python implementation lives under reference/python/, and the .NET implementation under reference/dotnet/ (the Philterd.PhiSql package and phisql CLI, built with ./build.sh or the .NET 10 SDK). Each compiler turns a .phisql file into the same Phileas JSON policy.Three ways to get going: deploy the open source yourself, spin it up from a cloud marketplace, or work with our team directly. Pick the path that fits.