Talk to the Team

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer email? support@philterd.ai

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

Apache Trino queries across data lakes, warehouses, and relational sources without moving data into one place. That federation is also a privacy problem: sensitive data in any connected source can land in the result set of any query a user runs. The Phileas Trino connector addresses this by making column-level PII masking a SQL primitive, so result rows reach the user already cleaned.

This guide walks through installing the connector, configuring a policy, and redacting PII in SQL. For the background on why redaction belongs at the query layer and how it compares to the alternatives, see the companion post PII Masking and Redaction in Trino with the Phileas Connector.

How it works

The connector is an Apache 2.0 plugin (Maven coordinates ai.philterd:phileas-connector) that registers one scalar SQL function:

phileas_redact(varchar) -> varchar

It takes a string value, applies the redaction policy you configured, and returns the redacted string. Because it is a scalar function rather than a connector exposing its own tables, it composes with everything Trino does: SELECT, JOIN, CREATE VIEW, CREATE TABLE AS SELECT, CTEs, and subqueries.

Under the hood the connector embeds the Phileas library inside Trino’s worker JVMs. Redaction runs in-process on each worker as rows are processed, so there is no external service, no per-row API call, and no text leaving the cluster.

Prerequisites

  • A running Trino cluster. The connector tracks Trino’s release numbering; an artifact version such as 481 aligns with Trino 481.
  • Java 25, if you build the connector from source.
  • A Phileas redaction policy file describing what to redact (covered below).

Install the connector

Build the plugin (or use the released artifact), copy it into Trino’s plugin directory, and add a catalog.

# 1. Build the plugin
git clone https://github.com/philterd/phileas-trino-connector
cd phileas-trino-connector
mvn -DskipTests package
# produces target/phileas-connector-<version>/ (the plugin directory)

# 2. Copy it into the Trino plugin directory on every node
cp -r target/phileas-connector-<version> $TRINO_HOME/plugin/phileas

Then create the catalog etc/catalog/phileas.properties, pointing at your policy file:

connector.name=phileas
phileas.policy.file=/etc/phileas/policy.json

Restart Trino. Creating the phileas catalog loads the policy and registers phileas_redact. The function is available globally; you do not query the phileas catalog directly, you call the function against columns from your other catalogs.

Configure the policy

phileas.policy.file is a standard Phileas redaction policy: an identifiers object naming the entity types to detect, each with a filter strategy. Here is a policy that redacts emails, phone numbers, SSNs, and names with labeled placeholders:

{
  "name": "trino",
  "identifiers": {
    "emailAddress": {
      "emailAddressFilterStrategies": [ { "strategy": "REDACT", "redactionFormat": "[EMAIL]" } ]
    },
    "phoneNumber": {
      "phoneNumberFilterStrategies": [ { "strategy": "REDACT", "redactionFormat": "[PHONE]" } ]
    },
    "ssn": {
      "ssnFilterStrategies": [ { "strategy": "REDACT", "redactionFormat": "[SSN]" } ]
    },
    "firstName": {
      "firstNameFilterStrategies": [ { "strategy": "REDACT", "redactionFormat": "[NAME]" } ]
    },
    "surname": {
      "surnameFilterStrategies": [ { "strategy": "REDACT", "redactionFormat": "[NAME]" } ]
    }
  }
}

Each type can use a different strategy: MASK to replace characters with a mask, LAST_4 to keep the last four digits of a card number, encryption for reversible output, and more. See Writing your first redaction policy and the policy schema guide for the full set, or start from a ready-made file in the policy library.

Redact in SQL

With the function registered, apply it to any varchar column.

Redact columns in a SELECT, keeping the raw values alongside for comparison:

SELECT
    full_name,
    phileas_redact(full_name) AS full_name_redacted,
    phileas_redact(email)     AS email_redacted,
    phileas_redact(ssn)       AS ssn_redacted
FROM postgresql.public.customers;

Redact while joining across the query, including free-text columns from another source:

SELECT
    c.id,
    phileas_redact(c.full_name)  AS customer,
    phileas_redact(t.transcript) AS transcript_redacted
FROM postgresql.public.customers c
JOIN postgresql.public.support_transcripts t
  ON t.customer_id = c.id;

Expose a redacted view, then grant access to the view instead of the underlying table, so consumers only ever see clean data:

CREATE VIEW analytics.customers_redacted AS
SELECT
    id,
    phileas_redact(full_name) AS full_name,
    phileas_redact(email)     AS email,
    phileas_redact(ssn)       AS ssn
FROM postgresql.public.customers;

Run the demo

The connector repository ships a one-command, self-contained demo: docker compose up stands up Trino with the connector and a Postgres source seeded with synthetic PII, and the demo README has copy-paste versions of the queries above showing before and after output. It is the fastest way to see the connector working before wiring it into your own cluster.

Operating considerations

  • Policy distribution. The policy file path is read per worker, so place the policy at the same path on every node (a shared mount, configuration management, or baked into the image).
  • Reloading the policy. The connector reads the policy once at startup. After editing it, restart Trino for the change to take effect.
  • Detection scope. Pattern-based types (emails, phone numbers, SSNs, card numbers) and dictionary-based names run in-process with no external dependency. Dictionary name detection catches common names, not every name; for broader name detection Phileas can call PhEye, a separate model service.
  • Validate on your own data. Detection is probabilistic and configurable. Test your policy against representative documents and confirm the output before relying on it; you remain responsible for what reaches your query results.

Where to go next

Frequently asked questions

Does the connector send data to an external service?
No. The connector embeds the Phileas library directly in Trino's worker JVMs, so redaction happens in-process as rows are scanned. There is no network hop, no per-row API call, and no text leaving your cluster.
What does it redact?
Whatever your policy file says. The connector applies a standard Phileas redaction policy, so it covers the same entity types and filter strategies (redact, mask, encrypt, pseudonymize, and more) as Philter and Phileas. Pattern-based and dictionary-based detection run entirely in-process; model-based name detection through PhEye is a separate service.
How do I change what gets redacted?
Edit the policy file the catalog points at. The connector reads the policy once at startup, so restart Trino after editing it for the change to take effect.
Is it open source?
Yes. The connector is Apache 2.0 (phileas-trino-connector), as is Phileas. You run it inside your own Trino cluster.