Node.js

Supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, Google Gemini, etc.

Installation

npm install zerox

Zerox uses graphicsmagick and ghostscript for the PDF to image processing step. These should be pulled automatically, but you may need to manually install.

sudo apt-get update
sudo apt-get install graphicsmagick ghostscript

Usage

With file URL

import { zerox } from "zerox";

const result = await zerox({
  filePath: "https://omni-demo-data.s3.amazonaws.com/test/cs101.pdf",
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

From local path

import { zerox } from "zerox";
import path from "path";

const result = await zerox({
  filePath: path.resolve(__dirname, "<path-to-your-file>"),
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

Params

Name	Type	Description
`filePath`	string	File URL or local path
`credentials`	Credentials	Provider-specific credentials
`cleanup`	boolean (optional)	Clear images from tmp after run. Defaults to `true`
`concurrency`	number (optional)	Number of pages to run at a time. Defaults to `10`
`directImageExtraction`	boolean (optional)	Extract directly from document images instead of the markdown. Defaults to `false`
`errorMode`	enum (optional)	Supported enums: `THROW`, `IGNORE`. Defaults to `IGNORE`
`extractOnly`	boolean (optional)	Set to true to only extract structured data using a schema. Defaults to `false`
`extractPerPage`	string[] (optional)	Extract data per page instead of the entire document
`imageDensity`	number (optional)	DPI for image conversion. Defaults to `300`
`imageHeight`	number (optional)	Maximum height for converted images. Defaults to `2048`
`llmParams`	object (optional)	Additional parameters to pass to the LLM
`maintainFormat`	boolean (optional)	Slower but helps maintain consistent formatting. Defaults to `false`
`maxImageSize`	number (optional)	If `maxImageSize` is set, images are auto-compressed to the size less than or equal to `maxImageSize`. Defaults to `15MB`
`maxRetries`	number (optional)	Number of retries to attempt on a failed page. Defaults to `1`
`maxTesseractWorkers`	number (optional)	Maximum number of Tesseract workers. Defaults to `-1`
`model`	enum (optional)	Model to use (see Supported Models). Defaults to `OPENAI_GPT_4O`
`modelProvider`	enum (optional)	Supported enums: `OPENAI`, `BEDROCK`, `GOOGLE`, `AZURE`. Defaults to `OPENAI`
`customModelFunction`	function (optional)	Runs custom model function
`outputDir`	string (optional)	Save combined result.md to a file
`pagesToConvertAsImages`	number[] (optional)	Page numbers to convert to image as array
`schema`	object (optional)	Schema for structured data extraction
`tempDir`	string (optional)	Directory to use for temporary files. Defaults to `/os/tmp`
`trimEdges`	boolean (optional)	Trims pixels from all edges that contain values similar to the given background color, which defaults to that of the top-left pixel. Defaults to `true`

The maintainFormat option tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. This requires the requests to run synchronously, so it’s a lot slower. But valuable if your documents have a lot of tabular data, or frequently have tables that cross pages.

Supported Models

Zerox supports a wide range of models across different providers:

OpenAI

Azure

AWS Bedrock

Supported models:

anthropic.claude-3-5-haiku-20241022-v1:0
anthropic.claude-3-5-sonnet-20240620-v1:0
anthropic.claude-3-5-sonnet-20241022-v2:0
anthropic.claude-3-haiku-20240307-v1:0
anthropic.claude-3-opus-20240229-v1:0
anthropic.claude-3-sonnet-20240229-v1:0

Credentials:

{
  accessKeyId, // Required if no sessionToken
  region, // Required if no sessionToken
  secretAccessKey,
  sessionToken, // Required if no accessKeyId or secretAccessKey
}

Google

Data Extraction

Zerox supports structured data extraction from documents using a schema. This allows you to pull specific information from documents in a structured format instead of getting the full markdown conversion. Set extractOnly: true and provide a schema to extract structured data. The schema follows the JSON Schema standard. Use extractPerPage to extract data per page instead of from the whole document at once. You can also set extractionModel, extractionModelProvider, and extractionCredentials to use a different model for extraction than OCR. By default, the same model is used.

Response

Example

{
  "completionTime": 10038,
  "fileName": "invoice_36258",
  "inputTokens": 25543,
  "outputTokens": 210,
  "pages": [
    {
      "page": 1,
      "content": "# INVOICE 36258 ...",
      "contentLength": 747
    }
  ],
  "extracted": null,
  "summary": {
    "totalPages": 1,
    "ocr": {
      "failed": 0,
      "successful": 1
    },
    "extracted": null
  }
}

Get Started

SDKs

Installation

Usage

With file URL

From local path

Params

Supported Models

Data Extraction

Response

Get Started

SDKs

​Installation

​Usage

​With file URL

​From local path

​Params

​Supported Models

​Data Extraction

​Response

Installation

Usage

With file URL

From local path

Params

Supported Models

Data Extraction

Response