Talk to the Team

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer email? support@philterd.ai

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

There is no single technique that makes data private. Privacy engineering is choosing the right technique for each value and each use case, and being precise about what each one actually guarantees. This guide maps the common techniques to when they fit, which part of the Philterd toolkit applies them, and, just as importantly, where each one does not apply.

A useful first distinction: is the transformation meant to be reversible? Suppression and differential privacy are one-way. Pseudonymization is reversible only if you keep the mapping. Encryption is reversible by design through a governed process. Keep that axis in mind as you read.

Redaction and masking

What it is. Remove the sensitive value or replace it with a placeholder or mask characters, so the original is gone from the output.

When to use it. The default for sharing or storing text that should not contain the value at all: logs, support transcripts, documents handed to a third party.

Philterd tool. Philter and Phileas, via the REDACT, MASK, TRUNCATE, and LAST_4 strategies.

Where it does not apply. When you still need to join or correlate on the value later. A redacted value is gone, so two records that shared it can no longer be linked. Use pseudonymization for that.

Pseudonymization and referential integrity

What it is. Replace a value with a consistent stand-in, so the same input maps to the same replacement everywhere. Relationships and joins survive even though the real values are gone from the output. This is pseudonymization, not anonymization: if the mapping is retained, it can be reversed.

When to use it. Analytics, testing, and data sharing where you need the data to stay useful, for example keeping every record for one customer linked without exposing who they are.

Philterd tool. Phileas’s replacement strategies, with Philter’s cache-backed referential integrity keeping the mapping consistent across documents and contexts.

Where it does not apply. Do not describe this as anonymization in a compliance context. The link between stand-in and original exists wherever the mapping is held, so the data is pseudonymous, not anonymous.

Encryption and format-preserving encryption

What it is. Replace a value with an encrypted form. Standard encryption changes the shape; format-preserving encryption (FPE) keeps the original format, so a 16-digit card number encrypts to another 16-digit string that still fits the column and downstream validation.

When to use it. When you need reversibility under control: the value must be recoverable later by an authorized, audited process, or it must keep its format to flow through systems that validate it.

Philterd tool. Philter’s CRYPTO_REPLACE and FPE_ENCRYPT_REPLACE strategies, with re-identification through the governed POST /api/reidentify endpoint. See What is format-preserving encryption?.

Where it does not apply. Reversibility is value-by-value through the governed endpoint, not a turnkey whole-document round-trip. If a value should never be recoverable, suppress it instead.

Generalization: bucketing and date shifting

What it is. Reduce precision instead of removing the value. Bucketing replaces an exact value with a range (an age becomes a band); date shifting moves dates by a consistent offset so intervals are preserved while exact dates are not.

When to use it. When the analytical value lives in the approximate quantity or the interval, not the exact figure, common in healthcare and research data sets.

Philterd tool. Phileas’s de-identification features: bucketing and date shifting.

Where it does not apply. Generalization alone can still re-identify in small or sparse populations. Combine it with other techniques, and validate the result against re-identification risk for your data.

Differential privacy

What it is. Add calibrated statistical noise to results so that no individual’s presence or absence measurably changes an output, giving a mathematical privacy guarantee for aggregates.

When to use it. Publishing or sharing statistics and counts over sensitive data: how many records matched, trends over time, distributions, where you want the aggregate without exposing any individual.

Philterd tool. Philter Diffuse, which applies differential privacy to PII counts and aggregations.

Where it does not apply. This is the important scope limit: differential privacy here protects aggregate statistics, not documents or records. It does not redact a document, and it is not a way to release a private version of a row-level dataset. Use redaction or pseudonymization for the records themselves.

Synthetic data

What it is. Generate realistic but entirely fabricated data instead of using real personal data, so there is no individual to expose in the first place.

When to use it. Model training, test fixtures, demos, and shared examples. It sidesteps the consent and exposure problems of training or testing on real PII.

Philterd tool. Synthetic data underpins how the PhEye models are trained; see The ethics of training: why we use synthetic data.

Where it does not apply. Synthetic data is drawn from a model of the real distribution, not the real distribution, so accuracy measured only on synthetic data is a ceiling. Validate on real data before trusting a result.

On-device, self-hosted inference

What it is. Run detection and redaction inside your own infrastructure rather than calling a third-party API, so sensitive text never leaves your boundary. This is an architecture choice rather than a transformation, and it is foundational to the rest.

When to use it. Always, for regulated or sensitive data. It is the difference between reducing exposure and creating a new one by shipping the data to someone else to redact.

Philterd tool. The whole toolkit is built for it: PhEye serves the models in your infrastructure, Phileas embeds directly in your application, and nothing requires a model-provider account or text leaving your VPC.

Where it does not apply. It is not itself a privacy transformation; it is what makes the transformations above safe to run. Pair it with the right technique for each value.

LLM prompt and response guardrails

What it is. Redact PII from prompts before they reach a language model provider, and scan responses on the way back, so sensitive data does not leak into a third-party LLM or its logs.

When to use it. Any application that sends user or document text to an external LLM, including RAG systems and AI agents.

Philterd tool. The Philter AI Proxy, a drop-in proxy for OpenAI, Anthropic Claude, Bedrock, Gemini, Ollama, and OpenAI-compatible providers.

Where it does not apply. It governs what crosses the boundary to the provider; it does not change how the model reasons. Treat it as one control in a broader privacy strategy.

Choosing a technique

TechniqueUse it whenReversible?Philterd tool
Redaction / maskingThe value should not appear at allNo (suppressed)Philter, Phileas
PseudonymizationYou need joins and correlation to surviveOnly if the mapping is keptPhileas + referential integrity
Encryption / FPEThe value must be recoverable, or keep its formatYes, governedPhilter
GeneralizationThe approximate value or interval is enoughNoPhileas (bucketing, date shifting)
Differential privacyYou publish aggregate statisticsNo (one-way)Philter Diffuse
Synthetic dataYou need data with no real individualsN/ASynthetic training data

Most real policies combine several of these, chosen per entity type. Start with Writing your first redaction policy and the policy schema guide to put them into practice.

A note on guarantees

These techniques reduce privacy risk; they do not eliminate it, and detection is probabilistic. Be precise about terms (pseudonymization is not anonymization), match the technique to the use case, and validate any configuration against your own representative data. You remain responsible for the data you process.

Frequently asked questions

Which technique is 'anonymization'?
Only the irreversible ones. Full suppression (removing a value outright) and differential privacy over aggregates are genuinely anonymizing. Replacing a value with a consistent stand-in is pseudonymization, not anonymization, because the mapping can be reversed if it is kept. Encryption-based strategies are reversible by design. Using the precise term matters for compliance claims.
Can I combine techniques in one policy?
Yes, and you usually should. A single redaction policy can suppress one entity type, pseudonymize another, encrypt a third, and generalize a fourth. The technique is chosen per entity type and per use case, not once for the whole document.
Does any of this guarantee no PII gets through?
No. Detection is probabilistic and these techniques reduce risk; they do not eliminate it. Validate any configuration against your own representative data, and treat the result as one layer of a defense-in-depth strategy. You remain the data controller for the data you process.