Use Cases

There are many use cases for a high accuracy data classification and protection system like Nightfall. Here are some of the most popular to spark your imagination.

We can't wait to hear more about what you're planning to build: reach out to us anytime at [email protected] to discuss your use case.

Protect sensitive data from transferring to downstream 3rd party services like LLM APIs.

Motivation

  • Third-party APIs provide services that greatly augment the capabilities of your applications.
    • For example, GenAI LLMs can automatically generate content. These LLMs can be accessed via APIs, such as OpenAI or Anthropic APIs.
    • Another example are telecom/communications APIs like SendGrid and Twilio that provide communications infrastructure.
  • The challenge is that these services may unnecessarily receive sensitive or confidential information from your application that is calling these APIs, which can pose data privacy risks because customer data is being shared outside the intended scope. For example, LLMs can handle very large inputs, or prompts, and these prompts may contain sensitive customer information.

Benefits

  • By filtering out customer data from API inputs, you will be able to leverage cutting-edge third-party services and APIs without introducing data privacy risks by oversharing sensitive or confidential information.

Sanitize user input to prevent unnecessary collection or proliferation of sensitive customer data.

Motivation

  • Applications collect and store sensitive information from consumers. Users may “overshare” or incorrectly input information, leading to sensitive data ending up in places it is not expected, or internal services may proliferate or handle this data in unexpected ways.
    • Fintech applications that intake, store, and generate files with PII like W-2s and paystubs.
    • Healthcare applications that handle protected health information or SSNs.
  • Marketplaces and social media applications allow for user generated content that may contain sensitive or illicit information, such as profanity, toxicity.
  • Support channels receive any inbound information from consumers, and can include highly sensitive information or over-sharing that is then exposed to support agents.
  • This data can come in a variety of unstructured formats - whether that be screenshots, images, documents, plaintext, compressed folders or archives, so to inspect this content requires high quality text extraction.

Benefits

  • Reduce the possibility of users inputting sensitive data that should not be collected or retained within your application or service by scanning data upon submission. Warn or prevent users from inputting sensitive data into form fields or file uploads.
  • Diminish collection of sensitive data types that could result in regulatory fines or brand damage, if leaked or breached.
  • Limit exposure of sensitive data to internal personnel like support agents that could lead to accidental misuse or intentional theft.

Audit and remove sensitive data in data silos and processing workflows for compliance.

Motivation

  • Compliance regimes like FedRAMP, PCI, and HIPAA may require that sensitive data is not proliferating into unsanctioned data silos, like project management systems, data warehouses, and logging infrastructure.
  • Many different development teams may be writing data into these internal services like logging and data warehousing, so it is challenging to enforce data sanitization on data ingress.
  • CDP tools like Segment and Fivetran can further proliferate sensitive data into a broader set of data silos than its original location.
  • Data analytics and data science teams may replicate and transform data, leading to further copies and versions across internal systems.
  • Edge cases, unexpected errors, and stack traces can lead to sensitive data landing or replicating in application logs.

Benefits

  • Identify and remove sensitive data from places that it shouldn’t be.
  • Monitor data at rest in data silos instead of at points of ingress/egress that would be hard to monitor or track.
  • Scan extremely high volumes of unstructured data at scale.
  • Build workflows to delete data, redact data, or alert the right teams when sensitive data is found where it shouldn’t be.

Build data classification and DLP features directly into your SaaS application.

Motivation

  • Data classification and DLP capabilities are increasingly expected by regulated institutions such as big banks.
  • Building data classification and DLP from scratch is complex and has high opportunity costs in moving developers away from working on the core product offering. Building a half-baked solution erodes customer trust, especially when there is already a high degree of skepticism around the quality of traditional DLP solutions.
  • SaaS and security vendors can deliver additional customer value and drive additional revenue through premium enterprise feature tiers that include security features like DLP, SAML SSO, audit logging, and more.

Benefits

  • Reduce time-to-market by leveraging out of the box components.
  • Reduce the overhead of an in-house data classification service that requires text extraction services, detector research and tuning, machine learning model development and deployment, maintenance & support.
  • Deliver best in class accuracy, reducing the risk of alert fatigue or missing sensitive data that erodes customer trust.

Centralize detection logic, custom detectors & regexes all in one place instead of embedded directly in code, and reduce the number of regexes required.

Motivation

  • Detecting a single type of sensitive data well (e.g. a credit card number) can be complex - requiring research and maintenance as the detector evolves over time. This becomes especially challenging for esoteric detectors, for example those that are region or industry-specific.
  • Managing regexes and input validation is complex and evolving. For example, a regex embedded in code to validate a Google Docs link may need to be updated over time as the format for Google Docs links changes, false positives are identified and accounted for, any performance implications are observed.
  • Many data types cannot be detected accurately with a regex because they require a certain level of validation, are heavily context dependent, or are highly variable or entropic in nature leading to a regex being overly sensitive or overly specific.

Benefits

  • Leverage out of the box detectors so no engineering time is spent on research, training, tuning detectors. No need to reinvent the wheel. These detectors span the categories of PII, PCI, PHI, credentials & secrets, ID numbers, and more.
  • Reduce time spent finding, tuning, and sharing regular expressions.
  • Build upon out of the box detectors with custom logic, instead of having to start from scratch with a regex or custom validation logic.

Improve accuracy of existing content inspection systems.

Motivations

  • Existing content inspection systems may yield a high degree of false positives (i.e. noise), leading to alert fatigue and significant time wasted on inaccurate alerts.
  • On the contrary, existing solutions may also be very limited in detection scope, leading to a high degree of false negatives (i.e. misses), putting the business at risk when sensitive data is missed.

Benefits

  • Replace existing, brittle solutions with a highly accurate content inspection system.
  • Reduce engineering time spent analyzing false positives and attempting to tune them out.

Sanitize inputs to labeled data used to train machine learning models.

Motivation

  • In training complex learning models, data scientists must compile and use large corpuses of data to improve the accuracy of the trained model. Unknowingly leveraging sensitive data in this effort can lead to violations of compliance regimes like HIPAA, GDPR, or PCI.
  • Models that focus on health, finance, public sector applications are particularly at risk for ingesting sensitive data that may violate industry specific compliance mandates.
  • Labeled data is often ingested from unregulated sources like customer communications, emails, public repos, and more. Inspecting all of these input sources manually is untenable.
  • Additionally, the data being leveraged may be in a variety of unstructured formats like screenshots, images, documents, plaintext, compressed folders or archives – to inspect this content requires high quality text extraction.

Benefits

  • Ensure the hygiene of the labeled data you are using to train your machine learning models
  • Diminish collection of sensitive data types that could result in regulatory fines or brand damage, if leaked or breached.

Example use cases by team and industry.

  • Healthcare: Detect PHI to ensure HIPAA compliance in your apps
  • Financial services: Secure PII and PCI like bank account numbers, payment card details, and social security numbers
  • E-commerce: Prevent costly data breaches of PII and PCI that can damage brand reputation
  • Education: Protect student and faculty privacy within applications
  • Customer support: Redact sensitive data in customer support system, shielding agents from information they shouldn’t see
  • IT Operations: Search for API keys, credentials, and secrets across internal and external data silos
  • Product: Create custom solutions for data classification, DLP, content moderation and more within your applications
  • Compliance: Address PCI-DSS, HIPAA, FedRAMP, GDPR, CCPA, GLBA, FERPA, PHIPA, and more
  • People & Community: Content moderation to detect profanity, toxicity
  • Gaming: Detecting profanity, toxicity, or even personal or financial information being shared in community chat rooms