Entities and Terms to Know

This section is a guide to the Entities in the Nightfall API and other terms you will need to know when using the API.

Detectors

Detectors provide the logic to find potentially sensitive pieces of data.

When this logic detects such data, the Detector is considered "triggered."

Nightfall's has numerous pre-built Detectors that are trained via machine learning. Detectors may also be defined with regular expressions or dictionaries. Their accuracy may be further refined with exclusion rules and context rules. Whether a Detector is triggered may be controlled by a minimum confidence threshold per Detector and minimum number of findings per Detector as set on a Detection Rule.

The built-in set of Detectors cover a number of different categories of data, including:

  • Standard PII (e.g. an email address or social security number)
  • Network (e.g. an IP Address)
  • Healthcare (e.g. US Medicare Beneficiary Number)
  • Finance (Credit Card Number or IBAN code)

The full set is enumerated in the Detector Glossary.

Custom Detectors

Nightfall also supports RE2 regexes and word lists for any custom detectors that you may want to implement.

Over time, we've aggregated the following regex library, which you're welcome to select from to save you some time. Please note that a regular expression is an established yet limited method that searches for pre-defined patterns, so your mileage may vary.

You can test regular expressions here.

You can input custom detectors in two ways: directly in the Nightfall Dashboard by navigating to DetectorsNew DetectorRegular expression, or define them inline.

Exclusion Rules

An exclusion rule is a regular expression or word list that will be used once a Detector is triggered by its primary expression or word list to eliminate false positives.

For instance, you may have a Detector designed to detect phone numbers. However, you may have a particular set of phone numbers that you use for testing purposes that are known not to be valid (e.g. they start with the prefix 555) and this should be ignored. Adding an exclusion rule would allow you to prevent those matches from being returned by the API.

See: Using Exclusion Rules

Context Rules

Context Rules are additional matching expressions for a Detector that may be used to adjust the confidence score of a match.

You may provide a regular expression and the number of leading or trailing characters within which a match of that expression must occur in order to adjust the confidence level to a particular level.

For instance, if you found a sequence that appeared to be a social security number based on its length or formatting, you might boost the confidence score if it was preceded by the text like “SSN” or “Social Security Number.”

Returning Surrounding Context

You may request that a sequence of bytes of a given length be provided from before and after the text that triggers a Detection Rule.

This information can help you better understand whether or not something is an actual violation by observing the circumstances within which the detected text was found.

You are limited to a maximum of 40 bytes of this context text preceding and trailing the match for a total of 80 bytes overall.

See: Using Context

Detection Rules

Detection Rules are aggregations of Detectors that are assigned a minimum confidence level. The identifiers of Detection Rules are used as a parameter to the API.

You may create Detection Rules as described in the section Creating Detection Rules and use their identifier as part of API calls to scan content.

Alternatively you may specify Detection Rules programmatically in each API call, as described in the scan method documentation below.

A Detection Rule is composed of a list of Detectors with which you wish to scan each request payload, where any or all Detectors may be satisfied in order to trigger the rule. You can add up to 50 total Detectors with a limit of 30 regular expression type custom detectors.

Additionally, each Detector in the Detection Rule is assigned a “minimum confidence” level (see below and a minimum number of findings to determine if the Detection Rule should be considered triggered.

Confidence Levels

Detection results will be returned with one of the following confidence values.

In practice, the API will only return detections assigned a confidence level of POSSIBLE or higher.

  • VERY_UNLIKELY
  • UNLIKELY
  • POSSIBLE
  • LIKELY
  • VERY_LIKELY

Learn more about what different confidence levels mean and how to choose the right minimum confidence level for your detection rule here.