Talk to the Team

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer email? support@philterd.ai

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

← All posts

The Phileas Trino Connector Is Now Open Source

The Phileas Trino connector is now public on GitHub under the Apache 2.0 license. It is a Trino plugin that exposes Phileas redaction as a scalar SQL function, so you can redact PII inside any varchar column as part of a normal query.

We introduced the idea earlier this year in The Phileas Trino Connector: PII Redaction as SQL , which covers the architecture and the query patterns in depth. This post is the release announcement: the repository is published, the artifact is on Maven Central, and you can install it today. Read that earlier post for the architectural background. Read this one to get the plugin running.

The problem it solves

Apache Trino (and its commercial distribution, Starburst) is a federated query engine. The whole point is to query across data lakes, warehouses, and relational databases without first copying everything into one place. That federation is also a privacy problem: sensitive data that was carefully gated in one source can land in the result set of any query the moment Trino joins it with something else.

There is no standard way to redact that data at query time. You either build an application layer in front of Trino, run a separate ETL stage to produce a clean copy, or accept that raw PII flows into result sets. The connector takes a different approach and makes redaction a SQL primitive: redaction happens inside Trino’s compute layer, in process on the worker JVMs, with no external service and no upstream data modification.

How it works

The connector registers a scalar function in Trino:

phileas_redact(varchar) -> varchar

The function takes a string value, applies Phileas detection and redaction, and returns the cleaned string. Because it is a scalar function rather than a connector that exposes its own tables, it composes with everything else Trino does: SELECT, JOIN, INSERT INTO, CREATE TABLE AS SELECT, views, CTEs, and window functions. The Phileas library is embedded directly in the worker JVMs, so there is no network hop and no per-row API call.

Installation

The connector follows the standard Trino plugin layout: a plugin directory under $TRINO_HOME/plugin/ and a catalog properties file under $TRINO_HOME/etc/catalog/. The artifact version tracks Trino’s release numbering (version 481 aligns with Trino 481).

Building from source needs Java 25 and Maven 3.9.x. The artifact is also published to Maven Central as ai.philterd:phileas-connector.

# 1. Clone and build the plugin
git clone https://github.com/philterd/phileas-connector
cd phileas-connector
mvn clean package

# 2. Deploy the plugin into Trino's plugin directory
rm -rf $TRINO_HOME/plugin/phileas
cp -r ./target/phileas-connector-481 $TRINO_HOME/plugin/phileas

# 3. Register the catalog
mkdir -p $TRINO_HOME/etc/catalog
cat > $TRINO_HOME/etc/catalog/phileas.properties << 'EOF'
connector.name=phileas
phileas.policy.file=/etc/trino/policies/default-policy.json
EOF

# 4. Start (or restart) Trino
cd $TRINO_HOME
bash bin/launcher run

In a multi-node cluster the plugin directory and the policy file path must be identical on every worker, because redaction runs in process on each one. Use a configuration-management deploy, a shared mount, or a baked container image so the plugin and policy are in the same place everywhere.

Configuration

Two properties drive the catalog file:

  • connector.name must be phileas. This is what tells Trino which plugin to load.
  • phileas.policy.file is the path to the Phileas policy file that defines what to redact.

The policy file is the same Phileas policy format used everywhere else in the toolkit. You can author it by hand, generate it from PhiSQL , or build it visually with the Redaction Policy Editor and export the JSON.

A worked example

Suppose a healthcare provider stores intake notes as free text in a Postgres table that Trino reads through a postgres catalog. The note column contains the kind of identifiers HIPAA Safe Harbor calls out: names, contact details, and so on. Analysts need the clinical content of those notes, not the patient identifiers.

Start with the raw value:

trino> SELECT note FROM postgres.intake.notes WHERE note_id = 4821;

                                       note
----------------------------------------------------------------------------------
 Patient reports follow-up needed. Reachable at jane.doe@example.com for scheduling.
(1 row)

Wrap the column in phileas_redact and the email address is masked before the row ever reaches the client:

trino> SELECT phileas_redact(note) AS note_redacted
       FROM postgres.intake.notes
       WHERE note_id = 4821;

                                  note_redacted
----------------------------------------------------------------------------------
 Patient reports follow-up needed. Reachable at **************** for scheduling.
(1 row)

The same call composes into the patterns that make the connector useful in practice: redact a column on the way out of a federated JOIN, materialize a clean table with CREATE TABLE AS SELECT, or wrap the redaction in a view and grant analytics teams access to the view instead of the underlying table. Those patterns are covered in detail in the earlier post .

Where the project stands today

This is an early open source release, and we would rather be precise about what is wired up than oversell it. The plugin, the function registration, and the in-process redaction path all work. The phileas.policy.file configuration property is in place and parsed, and the current build ships with email-address masking as the active redaction behavior while the full policy-file loading path is being completed. Track the repository for the release that reads the entire Phileas policy surface (per-entity strategies, custom dictionaries, conditional rules, and language settings) from that file.

If you want the full policy surface in a SQL context today, the HTTP-based Philter pattern (an external service that loads any policy on demand) covers it now, and the connector is closing the gap so the same policies run in process inside Trino.

Operational considerations

A few things to plan for before standing this up in production:

  • Per-query overhead. Redaction runs in process on each worker with no serialization beyond the normal column-value pipeline, so the cost is essentially Phileas’s own detection time: sub-millisecond for pattern detectors, single-digit milliseconds when NLP detectors fire. Trino distributes the work across all workers, so throughput scales with the cluster.
  • Policy caching. The policy is loaded per worker. A large policy initializes once on first use; subsequent queries on that worker are warm.
  • Hot reload. The connector loads its configuration at startup, so policy changes currently take effect on a worker restart rather than live. For workloads where the policy changes frequently, the external Philter service is the better fit.
  • Error behavior. The phileas_redact function is null-safe: a null input returns null, and if redaction throws, the function logs the error and returns null rather than failing the query. Validate that a missing or unreadable policy file surfaces in your monitoring, because a query that silently returns nulls is worse than one that fails loudly. Treat the policy file as code and test it on every change with Philter Scope .

Get it

If you are standing up Trino-resident redaction for the first time, or want help tuning a policy against your real data, get in touch .