Node.js SDK

Zerox SDK for Node.js

Supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, Google Gemini, etc.

Installation

npm install zerox

Zerox uses graphicsmagick and ghostscript for the PDF to image processing step. These should be pulled automatically, but you may need to manually install.

On linux use:

sudo apt-get update
sudo apt-get install -y graphicsmagick

Usage

With file URL

import { zerox } from "zerox";

const result = await zerox({
  filePath: "https://omni-demo-data.s3.amazonaws.com/test/cs101.pdf",
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

From local path

import { zerox } from "zerox";
import path from "path";

const result = await zerox({
  filePath: path.resolve(__dirname, "./cs101.pdf"),
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

Params

Name
Type
Description

filePath

string

File URL or local path

credentials

Provider-specific credentials

cleanup

boolean (optional)

Clear images from tmp after run. Defaults to true

concurrency

number (optional)

Number of pages to run at a time. Defaults to 10

errorMode

enum (optional)

Supported enums:

THROW IGNORE

Defaults to IGNORE

extractOnly

boolean (optional)

Set to true to only extract structured data using a schema. Defaults to false

extractPerPage

string[] (optional)

Extract data per page instead of the entire document

imageDensity

number (optional)

DPI for image conversion. Defaults to 300

imageHeight

number (optional)

Maximum height for converted images. Defaults to 2048

llmParams

object (optional)

Additional parameters to pass to the LLM

maintainFormat

boolean (optional)

Slower but helps maintain consistent formatting. Defaults to false

maxRetries

number (optional)

Number of retries to attempt on a failed page. Defaults to 1

maxTesseractWorkers

number (optional)

Maximum number of Tesseract workers. Defaults to -1

model

enum (optional)

Model to use (see Supported Models). Defaults to OPENAI_GPT_4O

modelProvider

enum (optional)

Supported enums: OPENAI BEDROCK GOOGLE AZURE Defaults to OPENAI

customModelFunction

function (optional)

Runs custom model function

outputDir

string (optional)

Save combined result.md to a file

pagesToConvertAsImages

number[] (optional)

Page numbers to convert to image as array

schema

object (optional)

Schema for structured data extraction

tempDir

string (optional)

Directory to use for temporary files. Defaults to /os/tmp

trimEdges

boolean (optional)

Trims pixels from all edges that contain values similar to the given background color, which defaults to that of the top-left pixel. Defaults to true

The maintainFormat option tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. This requires the requests to run synchronously, so it's a lot slower. But valuable if your documents have a lot of tabular data, or frequently have tables that cross pages.

Supported Models

Zerox supports a wide range of models across different providers:

Provider
Model
Credentials

Azure

  • gpt-4o

  • gpt-4o-mini

{
  apiKey, // Required
  endpoint, // Required
}

OpenAI

  • gpt-4o

  • gpt-4o-mini

{
  apiKey, // Required
}

AWS Bedrock

  • anthropic.claude-3-5-haiku-20241022-v1:0

  • anthropic.claude-3-5-sonnet-20240620-v1:0

  • anthropic.claude-3-5-sonnet-20241022-v2:0

  • anthropic.claude-3-haiku-20240307-v1:0

  • anthropic.claude-3-opus-20240229-v1:0

  • anthropic.claude-3-sonnet-20240229-v1:0

{
  accessKeyId, // Required if no sessionToken
  region, // Required if no sessionToken
  secretAccessKey,
  sessionToken, // Required if no accessKeyId or secretAccessKey
}

Google

  • gemini-1.5-flash

  • gemini-1.5-flash-8b

  • gemini-1.5-pro

  • gemini-2.0-flash-001

  • gemini-2.0-flash-lite-preview-02-05

{
  apiKey, // Required
}

Data Extraction

Zerox supports structured data extraction from documents using a schema. This allows you to pull specific information from documents in a structured format instead of getting the full markdown conversion.

Set extractOnly: true and provide a schema to extract structured data. The schema follows the JSON Schema standard.

Use extractPerPage to extract data per page instead of from the whole document at once.

You can also set extractionModel, extractionModelProvider, and extractionCredentials to use a different model for extraction than OCR. By default, the same model is used.

{
  
}

Response

Example

{
  completionTime: 10038,
  fileName: 'invoice_36258',
  inputTokens: 25543,
  outputTokens: 210,
  pages: [
    {
      page: 1,
      content: '# INVOICE # 36258\n' +
        '**Date:** Mar 06 2012  \n' +
        '**Ship Mode:** First Class  \n' +
        '**Balance Due:** $50.10  \n' +
        '## Bill To:\n' +
        'Aaron Bergman  \n' +
        '98103, Seattle,  \n' +
        'Washington, United States  \n' +
        '## Ship To:\n' +
        'Aaron Bergman  \n' +
        '98103, Seattle,  \n' +
        'Washington, United States  \n' +
        '\n' +
        '| Item                                       | Quantity | Rate   | Amount  |\n' +
        '|--------------------------------------------|----------|--------|---------|\n' +
        "| Global Push Button Manager's Chair, Indigo | 1        | $48.71 | $48.71  |\n" +
        '| Chairs, Furniture, FUR-CH-4421             |          |        |         |\n' +
        '\n' +
        '**Subtotal:** $48.71  \n' +
        '**Discount (20%):** $9.74  \n' +
        '**Shipping:** $11.13  \n' +
        '**Total:** $50.10  \n' +
        '---\n' +
        '**Notes:**  \n' +
        'Thanks for your business!  \n' +
        '**Terms:**  \n' +
        'Order ID : CA-2012-AB10015140-40974  ',
      contentLength: 747,
    }
  ],
  extracted: null,
  summary: {
    totalPages: 1,
    ocr: {
      failed: 0,
      successful: 1,
    },
    extracted: null,
  },
}

Last updated