GenAI Content Filtering: How to Prevent Exposure of Sensitive Data

Learn how to implement content filtering with ChatGPT to prevent exposure of sensitive customer and company data.

See a live demo of this functionality in the Nightfall Playground.

The Data Sprawl Problem with Generative AI

Advancements in AI have led to the creation of generative AI systems like ChatGPT, which can generate human-like responses to text-based inputs. However, these inputs are at the discretion of the user and they aren’t automatically filtered for sensitive data.

This means that these systems can also be used to generate content from sensitive data, such as medical records, financial information, or personal details. In these cases, content filtering is crucial to prevent the unauthorized disclosure of sensitive data.

Similarly, content filtering is essential for ensuring compliance with data privacy laws and regulations. These laws require companies to protect sensitive data and prevent its unauthorized disclosure.

For example, consider a few real-world scenarios:

  • You are using OpenAI to help debug code or for code completion. If your code inputted to sensitive data has an API key, that key will be transmitted to OpenAI. For example:
Hey ChatGPT, what's wrong with this code? 

import stripe
stripe.api_key = "**sk_live_4eC39HqLyjWDarjtT1zdp7dc**"

starter_subscription = stripe.Product.create(
  name="Starter Subscription",
  description="$12/Month subscription",
)
  • You are using OpenAI to help customer service agents respond to customer inquiries and troubleshoot issues. Support tickets have customers’ sensitive PII, credit card numbers, and Social Security numbers. That data may get transmitted by your service agents to OpenAI.
  • You are using OpenAI to moderate content sent by patients or doctors in a health app you are building. These queries may contain sensitive protected health information (PHI) that gets transmitted unnecessarily to OpenAI.

Content filtering can be used to remove any sensitive data before it is processed by the AI system, ensuring that only the necessary information is used to generate content. This prevents sensitive data sprawl to AI systems.

In this guide, we will walk through an example of how to add content filtering to a service that uses an OpenAI GPT model through its APIs.

Standard Pattern for Using OpenAI Model APIs

A typical pattern for leveraging GPT is as follows:

  1. Get an API key and set environment variables
  2. Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request
  3. Construct your prompt and decide which endpoint and model is most applicable
  4. Send the request to OpenAI

Let's look at a simple example in Python. We’ll ask a GPT model for an auto-generated response we can send to a customer that is asking our customer support team about an issue with their payment method. Note how easy it is to send sensitive data, in this case, a credit card number to ChatGPT.

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = "The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?"

completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": prompt}
  ]
)

print("Here's a generated response you can send the customer:\n\n", completion['choices'][0].message.content)

🚨 This is a risky practice because now we are sending sensitive customer information to OpenAI. Next, let’s explore how we can prevent this while still getting the full benefit of using ChatGPT.

Adding Content Filtering to the Pattern

It is straightforward to update this pattern to use Nightfall to check for sensitive findings and ensure sensitive data isn’t sent out. Here’s how:

Step 1: Setup Nightfall

Get an API key for Nightfall and set environment variables. Learn more about creating a Nightfall API key here. In this example, we’ll use the Nightfall Python SDK.

Step 2: Configure Detection

Create a pre-configured detection rule in the Nightfall dashboard or inline detection rule with the Nightfall API or SDK client.

📘

💡 Consider using Redaction

Note that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3: Classify, Redact, Filter

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint.

The Nightfall API will respond with detections and the redacted payload, for example:

{
  "findings": [
    [
      {
        "finding": "458-02-6124",
       "redactedFinding": "***-**-****",
        "detector": {
          "name": "US Social Security Number",
          "uuid": "e30d9a87-f6c7-46b9-a8f4-16547901e069"
        },
        "confidence": "VERY_LIKELY",
        "location": {
          "byteRange": {
            "start": 39,
            "end": 50
          },
          "codepointRange": {
            "start": 39,
            "end": 50
          }
        },
        "redactedLocation": {
          "byteRange": {
            "start": 39,
            "end": 50
          },
          "codepointRange": {
            "start": 39,
            "end": 50
          }
        },
        "matchedDetectionRuleUUIDs": [],
        "matchedDetectionRules": [
          "My Match Rule"
        ]
      }
    ]
  ],
  "redactedPayload": [
    "Thanks for getting back to me. My SSN is ***-**-****."
  ]
}

For example, let’s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is [REDACTED] and the card is getting declined.' How should I respond to the customer?

Send Redacted Prompt to OpenAI

  • Review the response to see if Nightfall has returned sensitive findings:
    • If there are sensitive findings:
      • You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically
      • Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
    • If no sensitive findings or you chose to redact findings with a redaction config:
      • Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request
      • Construct your outgoing prompt
        • If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you
      • Use the OpenAI API or SDK client to send the prompt to the AI model

Python Example

Let's take a look at what this would look like in a Python example using the OpenAI and Nightfall Python SDKs:

import os
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
import os
import openai

nightfall = Nightfall() # By default Nightfall will read the NIGHTFALL_API_KEY environment variable
openai.api_key = os.getenv("OPENAI_API_KEY")

# The message you intend to send
prompt = "The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?"
payload = [ prompt ]

# Define an inline detection rule that looks for Likely Credit Card Numbers and redacts them
detection_rule = DetectionRule([
                            Detector(
                                min_confidence=Confidence.LIKELY,
                                nightfall_detector="CREDIT_CARD_NUMBER",
                                display_name="Credit Card Number",
                                redaction_config=RedactionConfig(
                                    remove_finding=False, 
                                    substitution_phrase="[REDACTED]")
                            )])

# Send the message to Nightfall to scan it for sensitive data
# Nightfall returns the sensitive findings, and a copy of your input payload with sensitive data redacted
findings, redacted_payload = nightfall.scan_text(
                        payload,
                        detection_rules=[detection_rule])

# If the message has sensitive data, use the redacted version, otherwise use the original message
if redacted_payload[0]:
    message_body = redacted_payload[0]
else:
    message_body = payload[0]

print("After content filtering - this is what will be sent to ChatGPT:\n\n", message_body, "\n\n----\n\n")

# Send prompt to OpenAI model for AI-generated response
completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": message_body}
  ]
)

print("Here's a generated response you can send the customer:\n\n", completion['choices'][0].message.content)

Let’s take a look at the output printed to the console:

After content filtering - this is what will be sent to ChatGPT:

 The customer said: 'My credit card number is [REDACTED] and the card is getting declined.' How should I respond to the customer? 

----

Here's a generated response you can send the customer:

 I'm sorry to hear that your card is being declined. Can you please provide me with more information about the transaction you are trying to make? This will help me investigate the issue and provide you with a solution.

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

The customer said: '4916-6734-7572-5015 is my credit card number and the card is getting declined.' How should I respond to the customer?

And the message we ultimately sent was redacted, and that’s what we sent to OpenAI!

The customer said: 'My credit card number is [REDACTED] and the card is getting declined.' How should I respond to the customer?

OpenAI sends us the same response either way because it doesn’t need to receive the sensitive data to generate a cogent response. This means we were able to leverage ChatGPT just as easily but we didn’t risk sending OpenAI any unnecessary sensitive data. Now you are one step closer to leveraging generative AI safely in an enterprise setting.