Supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, Google Gemini, etc.

Installation

npm install zerox

Zerox uses graphicsmagick and ghostscript for the PDF to image processing step. These should be pulled automatically, but you may need to manually install.

sudo apt-get update
sudo apt-get install graphicsmagick ghostscript

Usage

With file URL

import { zerox } from 'zerox';

const result = await zerox({
  filePath: 'https://omni-demo-data.s3.amazonaws.com/test/cs101.pdf',
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

From local path

import { zerox } from 'zerox';
import path from 'path';

const result = await zerox({
  filePath: path.resolve(__dirname, '<path-to-your-file>'),
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

Params

NameTypeDescription
filePathstringFile URL or local path
credentialsCredentialsProvider-specific credentials
cleanupboolean (optional)Clear images from tmp after run. Defaults to true
concurrencynumber (optional)Number of pages to run at a time. Defaults to 10
errorModeenum (optional)Supported enums: THROW, IGNORE. Defaults to IGNORE
extractOnlyboolean (optional)Set to true to only extract structured data using a schema. Defaults to false
extractPerPagestring[] (optional)Extract data per page instead of the entire document
imageDensitynumber (optional)DPI for image conversion. Defaults to 300
imageHeightnumber (optional)Maximum height for converted images. Defaults to 2048
llmParamsobject (optional)Additional parameters to pass to the LLM
maintainFormatboolean (optional)Slower but helps maintain consistent formatting. Defaults to false
maxRetriesnumber (optional)Number of retries to attempt on a failed page. Defaults to 1
maxTesseractWorkersnumber (optional)Maximum number of Tesseract workers. Defaults to -1
modelenum (optional)Model to use (see Supported Models). Defaults to OPENAI_GPT_4O
modelProviderenum (optional)Supported enums: OPENAI, BEDROCK, GOOGLE, AZURE. Defaults to OPENAI
customModelFunctionfunction (optional)Runs custom model function
outputDirstring (optional)Save combined result.md to a file
pagesToConvertAsImagesnumber[] (optional)Page numbers to convert to image as array
schemaobject (optional)Schema for structured data extraction
tempDirstring (optional)Directory to use for temporary files. Defaults to /os/tmp
trimEdgesboolean (optional)Trims pixels from all edges that contain values similar to the given background color, which defaults to that of the top-left pixel. Defaults to true

The maintainFormat option tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. This requires the requests to run synchronously, so it’s a lot slower. But valuable if your documents have a lot of tabular data, or frequently have tables that cross pages.

Supported Models

Zerox supports a wide range of models across different providers:

Data Extraction

Zerox supports structured data extraction from documents using a schema. This allows you to pull specific information from documents in a structured format instead of getting the full markdown conversion.

Set extractOnly: true and provide a schema to extract structured data. The schema follows the JSON Schema standard.

Use extractPerPage to extract data per page instead of from the whole document at once.

You can also set extractionModel, extractionModelProvider, and extractionCredentials to use a different model for extraction than OCR. By default, the same model is used.

Response

Example

{
  "completionTime": 10038,
  "fileName": "invoice_36258",
  "inputTokens": 25543,
  "outputTokens": 210,
  "pages": [
    {
      "page": 1,
      "content": "# INVOICE 36258 ...",
      "contentLength": 747
    }
  ],
  "extracted": null,
  "summary": {
    "totalPages": 1,
    "ocr": {
      "failed": 0,
      "successful": 1
    },
    "extracted": null
  }
}