Automatically Redacting PII and PHI from Files in Amazon S3 using Amazon Macie and Philter
The Philterd Team
October 2, 2024
Amazon Macie is "a data security service that discovers sensitive data using machine learning and pattern matching." With Amazon Macie you can find potentially sensitive information in files in your Amazon S3 buckets, but what do you do when Amazon Macie finds a file that contains an SSN, phone number, or other piece of sensitive information?
Philter is software that redacts PII, PHI, and other sensitive information from text. Philter runs entirely within your private cloud and does not require any external connectivity. Your data never leaves your private cloud and is not sent to any third-party. In fact, you can run Philter without any external network connectivity and we recommend doing so!
In this blog post we will show how you can use Philter alongside Amazon Macie, Amazon EventBridge, and AWS Lambda to find and redact PII, PHI, or other sensitive information in your files in Amazon S3. If you are setting this up for your organization and need help, feel free to reach out!
How it Works
Here's how it will work (refer to the diagram below):
Amazon Macie will look for files in Amazon S3 buckets that contain potentially sensitive information.
When Amazon Macie identifies a file, it will be sent as an event to Amazon EventBridge.
An Amazon EventBridge rule that detects events from Amazon Macie will invoke an AWS Lambda function.
The AWS Lambda function will use Philter to redact the file.
Setting it Up
Configuring Amazon Macie
The first thing we will do is enable Amazon Macie. It's easiest to follow the provided steps to enable Amazon Macie in your account - it's just a few clicks. Once you have Amazon Macie configured, come back here to continue!
Creating the AWS Lambda Function
Next, we want to create an AWS Lambda function. This function will be invoked whenever a file in an Amazon S3 bucket is found to contain sensitive information. Our function will be provided the name of the bucket and the object's key. With that information, our function can retrieve the file, use Philter to redact the sensitive information, and either overwrite the existing file or write the redacted file to a new object.
The Lambda function will receive a JSON object that contains the details of the files identified by Amazon Macie. It will look like this:
{
"version": "0",
"id": "event ID",
"detail-type": "Macie Finding",
"source": "aws.macie",
"account": "AWS account ID (string)",
"time": "event timestamp (string)",
"region": "AWS Region (string)",
"resources": [
<-- ARNs of the resources involved in the event -->
],
"detail": {
<-- Details of a policy or sensitive data finding -->
},
"policyDetails": null,
"sample": Boolean,
"archived": Boolean
}
You can find more about the schema of the event here. What's most important to us is the name of the bucket and the key of the object identified by Amazon Macie. In the detail section of the above JSON object, there will be an s3Object that contains that information:
"s3Object":{
"bucketArn":"arn:aws:s3:::my-bucket",
"key":"sensitive.txt",
"path":"my-bucket/sensitive.txt",
"extension":"txt",
"lastModified":"2023-10-05T01:32:21.000Z",
"versionId":"",
"serverSideEncryption":{
"encryptionType":"AES256",
"kmsMasterKeyId":"None"
},
"size":807,
"storageClass":"STANDARD",
"tags":[
],
"publicAccess":false,
"etag":"accdb2c550e3aa13610cbd87b91e3ec7"
}
This information gives the location of the identified file! It is s3://my-bucket/sensitive.txt. Now we can use Philter to redact this file!
You have a few choices here. You can have your AWS Lambda function grab that file from S3, redact it using Philter, and then overwrite the existing file. Or, you can choose to write it to a new file in S3 and preserve the original file. Which you do is up to you and your business requirements!
Redacting the File with Philter
To use Philter you must have an instance of it running! You can quickly launch Philter as an Amazon EC2 instance via the AWS Marketplace. In under 5 minutes you will have a running Philter instance ready to redact text via its API.
With Philter's API, you can use any programming language you like. There are client SDKs available for Java, .NET, and Go, but the Philter API is simple and easily callable from other languages like Python. You just need to be able to access Philter's API from your Lambda function at an endpoint like https://<philter-ip>:8080.
You just need to decide how you want to redact the file. Redaction in Philter is done via a policy and you can set your policy based on your business needs. Perhaps you want to mask social security numbers, shift dates, redact email addresses, and generate random person's names. You can create a Philter policy to do just that and apply it when calling Philter's API. Learn more about policies or to see some sample policies.
Once you have your AWS Lambda function and Philter policy the way you want it, you can deploy the Lambda function:
aws lambda create-function --function-name redact-with-philter \
--runtime python3.11 --handler lambda_function.lambda_handler \
--role arn:aws:iam::accountId:role/service-role/my-lambda-role \
--zip-file fileb://code.zip
Just update the values in that command as needed. Don't forget to set your AWS account ID in the role's ARN!
Configuring Amazon EventBridge
To create the Amazon EventBridge rule:
aws events put-rule --name MacieFindings --event-pattern "{\"source\":[\"aws.macie\"]}"
MacieFindings is the name that you want to give the rule. The response will be an ARN - note it because you will need it.
Now we want to specify the AWS Lambda function that will be invoked by our EventBridge rule:
aws events put-targets \
--rule MacieFindings \
--targets Id=1,Arn=arn:aws:lambda:regionalEndpoint:accountID:function:my-findings-function
Just replace the values in the function's ARN with the details of your AWS Lambda function. Lastly, we just need to give EventBridge permissions to invoke the Lambda function:
aws lambda add-permission \
--function-name redact-with-philter \
--statement-id Sid \
--action lambda:InvokeFunction \
--principal events.amazonaws.com \
--source-arn arn:aws:events:regionalEndpoint:accountId:rule:MacieFindings
Again, update the ARN as appropriate.
Now, when Amazon Macie runs and finds potentially sensitive information in an object in one of your Amazon S3 buckets, an event will be sent to EventBridge, where the rule we created will incoke our Lambda function. The file will be sent to Philter where it will be redacted. The redacted text will then be returned to the Lambda function.
Summary
In this blog post we have provided the framework for using Philter alongside Amazon Macie, Amazon EventBridge, and AWS Lambda to redact PII, PHI, and other sensitive information from files in Amazon S3 buckets.
If you need help setting this up please reach out! We can help you through the steps.
Philter is available from the AWS Marketplace. Not using AWS? Philter is also available from the Google Cloud Marketplace and the Microsoft Azure Marketplace.