Phirestream can redact redact over 30 types of sensitive information, including PII, PHI, and NPPI, from data streaming through Apache Kafka topics.
Phirestream works with Apache Kafka and redacts PII and PHI from data as it streams through Apache Kafka topics.
Phirestream can redact over 30 types of PII and PHI.
Phirestream works entirely within your cloud. Your data is never sent outside of your VPC.
Frequently asked questions about Phirestream. For any questions not answered here please refer to our Support.
Phirestream is an application that filters sensitive information from streaming text prior to that text being published to Apache Kafka. Phirestream works by acting as a proxy to Apache Kafka. Phirestream processes the published text to redact, remove, or encrypt the types of sensitive information you have defined in Phirestream’s settings. Phirestream then publishes the filtered text to your Apache Kafka brokers.
The goal of Phirestream is to keep sensitive information from entering your Apache Kafka topics and downstream pipelines and applications.
Does Phirestream use ChatGPT or other third-party APIs?
No. Phirestream never transmits your text or documents to any third-party service.
Phirestream can run in a firewalled (or even air-gapped) environment. For example, if you are using AWS, you can deploy Phirestream to a private subnet and use security groups and network ACLs to prevent any outbound traffic from the Phirestream instance and its subnet. In fact, we recommend doing so to increase your overall security posture.
Phirestream is built upon an open source project called Phileas, an open source engine for finding and redacting PII and PHI. Phirestream builds on Phileas to support Apache Kafka and to provide custom trained NLP models.
Everyone is welcome to check out the Phileas code to learn more about how Philter works, to submit an issue when one is found, and to contribute via pull requests. Phileas is licensed under the Apache License, version 2.
The presence of sensitive information in streaming text can present difficult challenges. If the sensitive information is not needed by downstream applications then the presence of sensitive information presents an unnecessary security risk. By using Phirestream to keep sensitive information from ever entering the Apache Kafka topics, we can help keep the cluster and the downstream applications secure.
Phirestreamcan redact many types of PII, PHI, and other sensitive information. We are constantly adding new types of information and new versions of each type. For example, a person’s age may be written in many ways and we work to add new ways as we discover them. If you wish to discuss these types of information in depth please contact us.
Some of the types of PII, PHI, and sensitive information identified by Phirestream are listed below:
Phirestream uses what we call filter profiles. A filter profile is a file that you give to Phirestream to tell it the types of sensitive information you want to identify. A filter profile lists the types of sensitive information (phone numbers, names, etc.), when to remove them, and how to remove them.
Phirestream can be deployed as a container or in your cloud in just a few minutes. See below for links to Phirestream on the cloud marketplaces. For container-based deployments, please contact us.
The precision and recall metrics depend greatly on your data. Each user’s data is different so comparing these metrics across users would be apples and oranges. So, us making a claim like “Phirestream's F1 score is 99%” is meaningless if your data does not exactly match our test data (and we know it doesn’t!). If another vendor tells you their accuracy without even seeing your documents, be very, very cautious!
Instead, we will gladly accept some representative text and spend a few days to gather those metrics specific to your data. We will provide you with the collected metrics along with the redacted text. This will provide you an accurate overview of how Philter performed on your text.
Phirestreamuses state of the art natural language processing (NLP) technology to identify sensitive information in text. These NLP methods use trained models created from a large corpus of text. The process of applying the model to text is non-deterministic. There are many factors that could affect the identification of sensitive information in your text such as how similar your text is to the corpus that was used to train the model, how the text is formatted, and the length of the text. For these reasons, it is important that you assess Philter’s performance prior to utilization in a production system.
The confidence value in the filter strategy condition can be used to tune the NLP engine’s detection. Each identified entity has an associated confidence score between 0 and 100 indicating the model’s estimate that the text is actually an entity, with 0 being the lowest confidence and 100 being the highest confidence. The confidence value in the filter strategy allows you to filter out entities based on the confidence. For example, the condition confidence > 75 means that entities having less than a 75 confidence value will be ignored and entities having a confidence value greater than 75 will be filtered from the text.
Phirestreamsupports several platforms and which platform is used may be determined by your choice of cloud provider. Other platforms may be available upon request.
See Phirestream's home page for more details.
You can view Philter’s license agreement here.