Talk to the Team

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer email? support@philterd.ai

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

← All posts

Shielding Your Search: Redacting PII and PHI in Elasticsearch with the Search Redact Plugin

In today’s data-driven world, safeguarding Personally Identifiable Information (PII) and Protected Health Information (PHI) is paramount. When leveraging search platforms like Elasticsearch, ensuring sensitive data remains confidential is crucial. Enter Search Redact, an open-source Elasticsearch plugin that leverages the power of the Phileas project to effectively redact and de-identify PII and PHI within your search results.

This post explores how Search Redact can bolster your data privacy and security when using Elasticsearch. Search Redact is available on GitHub at https://github.com/philterd/search-redact-elasticsearch-plugin . There is also a version for OpenSearch .

What is Search Redact?

Search Redact is a specialized Elasticsearch plugin designed to seamlessly integrate redaction and de-identification capabilities directly into your search workflow. Built upon the foundation of the open-source Phileas project, Search Redact provides a robust and flexible mechanism for identifying and masking sensitive information within your indexed documents. This ensures you can search your data without the risk of exposing PII or PHI, which is essential for compliance with regulations like GDPR, CCPA, and HIPAA.

Phileas: The Engine Behind Search Redact

Search Redact leverages the Phileas project, a powerful engine for identifying and transforming sensitive data. Phileas offers a wide range of capabilities, including:

  • Named Entity Recognition (NER): Identifying and classifying named entities like people, organizations, locations, and dates.
  • Regular Expressions: Matching patterns for specific data formats like phone numbers, email addresses, and social security numbers.
  • Dictionaries: Using lists of known sensitive terms for redaction.
  • Customizable Rules: Defining your own specific redaction rules based on your unique data and requirements.

By integrating Phileas, Search Redact benefits from its sophisticated analysis and transformation capabilities, providing a comprehensive solution for data protection.

Why use Search Redact?

  • Enhanced Data Privacy: Search Redact gives you granular control over what information is displayed in search results, preventing the accidental exposure of sensitive data.
  • Regulatory Compliance: By redacting PII and PHI, Search Redact helps your organization meet the stringent requirements of data privacy and security regulations.
  • Improved Security Posture: Reducing the risk of data breaches associated with sensitive information.
  • Flexible and Customizable: Search Redact’s integration with Phileas allows for highly flexible configuration of redaction rules, tailored to your specific needs.
  • Open Source and Community Driven: Being open-source, Search Redact is free to use and benefits from community contributions and ongoing improvements.

How to Use Search Redact

  1. Installation: The first step is to install the Search Redact plugin within your Elasticsearch cluster. Refer to the Search Redact documentation on GitHub for detailed installation instructions specific to your Elasticsearch version.
  2. Defining Redaction Rules in a Policy (Leveraging Phileas): This is the core of Search Redact’s functionality. You’ll leverage Phileas’s capabilities to identify the types of PII and PHI you want to protect (e.g., names, addresses, social security numbers, medical record numbers) and create corresponding rules. You can use regular expressions, dictionaries, or leverage pre-trained NER models provided by Phileas.
  3. Testing and Validation: Once you’ve configured Search Redact, thorough testing is essential. Run searches against your data and verify that the sensitive information is being correctly redacted and de-identified.
  4. Integration with Elasticsearch Queries: After testing, you can integrate Search Redact directly into your Elasticsearch queries. This ensures that redaction happens automatically whenever a search is performed.

The following is an example query that redacts email addresses from the description field.

curl -s http://localhost:9200/sample_index/_search -H "Content-Type: application/json" -d'
   {
    "ext": {
       "search-redact": {
          "field": "description",
          "policy": "{\"identifiers\": {\"emailAddress\":{\"emailAddressFilterStrategies\":[{\"strategy\":\"REDACT\",\"redactionFormat\":\"{{{REDACTED-%t}}}\"}]}}}"
        }
     },
     "query": {
       "match_all": {}
     }
   }'

Conclusion

Search Redact, powered by Phileas, offers a robust and effective solution for protecting sensitive data within your Elasticsearch environment. By implementing Search Redact and defining appropriate redaction and de-identification rules, you can significantly reduce the risk of exposing PII and PHI, ensuring compliance and enhancing data privacy. Remember to consult the official Search Redact documentation on GitHub for the most up-to-date information and detailed instructions. Protecting sensitive data is a continuous process, and Search Redact can be a valuable tool in your data privacy strategy.


Related posts: