Supported MIME Types

The file scan API has first-class support for text extraction and scanning on all MIME types enumerated below.

📘

Handling of MIME Types Not Listed

Files with a MIME type not listed below are processed using an unoptimized text extractor. As a result, the quality of the text extraction for unrecognized types may vary.

Accepted Text and Derivatives

  • application/json
  • application/x-ndjson
  • application/x-php
  • text/calendar
  • text/css
  • text/csv
  • text/html
  • text/javascript
  • text/plain
  • text/tab-separated-values
  • text/tsv
  • text/x-php

Accepted Office Formats

  • application/pdf
  • application/vnd.openxmlformats-officedocument.presentationml.presentation
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • application/vnd.openxmlformats-officedocument.wordprocessingml.document

Accepted Archive and Compressed File Types

  • application/bzip2
  • application/ear
  • application/gzip
  • application/jar
  • application/java-archive
  • application/tar+gzip
  • application/vnd.android.package-archive
  • application/war
  • application/x-bzip2
  • application/x-gzip
  • application/x-rar-compressed
  • application/x-tar
  • application/x-webarchive
  • application/x-zip-compressed
  • application/x-zip
  • application/zip

Accepted Images File Types

  • image/apng
  • image/avif
  • image/gif
  • image/jpeg
  • image/jpg
  • image/png
  • image/svg+xml
  • image/tiff
  • image/webp

Rejected MIME Types

The file scan API explicitly rejects requests with MIME types that are not conducive to extracting or scanning text. Sample rejected MIME types include:

  • application/photoshop
  • audio/midi
  • audio/wav
  • video/mp4
  • video/quicktime