Special File Types

🚧

After 31st July 2024, this page would permanently be moved to a new location. You can access this page from a new URL which is present here. If you have saved or bookmarked the current URL, kindly update it with the new URL, since there will be no 301 redirect from the current URL to the new URL.

Spreadsheets and Tabular Data

File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange, codepointRange, and lineRange properties.

Findings will contain a columnRange and a rowRange that will allow you to identify the specific row and column within the tabular data wherein the finding is present.

This functionality is applicable to the following mime types:

  • text/csv
  • text/tab-separated-values
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • application/vnd.ms-excel

Apache parquet data files are also accepted.

Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.

{
   "findings":[
      {
         "path":"Sheet1 (5)",
         "detector":{
            "id":"e30d9a87-f6c7-46b9-a8f4-16547901e069",
            "name":"US social security number (SSN)",
            "version":1
         },
         "finding":"624-84-9182",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":2505,
               "end":2516
            },
            "codepointRange":{
               "start":2452,
               "end":2463
            },
            "lineRange":{
               "start":55,
               "end":55
            },
            "rowRange":{
               "start":55,
               "end":55
            },
            "columnRange":{
               "start":2,
               "end":2
            },
            "commitHash":""
         },
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
...

Git Repositories

Nightfall provides special handling for archives of GitHub repositories.

Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.

In order to scan the repository, you will need to create a clone, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

This creates a clone of the Nightfall go SDK.

You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.

zip -r directory.zip directory

Note that in order to work, the hidden directory .github must be included in the archive.

When you initiate the file upload sequence with this file, you will receive scan results that contain the commitHash property filled in.

Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http:// or https://), which will send results such as the following:

{
   "findings":[
      {
         "path":"f607a067..53e59684/nightfall.go",
         "detector":{
            "id":"6123060e-2d9f-4f35-a7a1-743379ea5616",
            "name":"URL"
         },
         "finding":"https://api.nightfall.ai/\"",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":142,
               "end":168
            },
            "codepointRange":{
               "start":142,
               "end":168
            },
            "lineRange":{
               "start":16,
               "end":16
            },
            "rowRange":{
               "start":0,
               "end":0
            },
            "columnRange":{
               "start":0,
               "end":0
            },
            "commitHash":"53e59684d9778ceb0f0ed6a4b949c464c24d35ce"
         },
         "beforeContext":"tp\"\n\t\"os\"\n\t\"time\"\n)\n\nconst (\n\tAPIURL = \"",
         "afterContext":"\n\n\tDefaultFileUploadConcurrency = 1\n\tDef",
         "matchedDetectionRuleUUIDs":[
            "cda0367f-aa75-4d6a-904f-0311209b3383"
         ],
         "matchedDetectionRules":[
            
         ]
      },
 ...

❗️

Sensitive Data in GitHub Respoitories

If the finding in a GitHub repository is considered to be sensitive, it should be considered compromised and appropriate mitigation steps (i.e. secrets should be rotated).

To retrieve the specific checkout, you will need to clone the repository, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

You can then checkout the specific commit using the commit hash returned by Nightfall.

cd nightfall-go-sdk
git checkout 53e59684d9778ceb0f0ed6a4b949c464c24d35ce

Note that you are in a 'detached HEAD' state when workin with this sort of check out of a repository.

See also: Removing sensitive data fro a repository