Node.js SDK
Zerox SDK for Node.js
Supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, Google Gemini, etc.
Installation
Zerox uses graphicsmagick
and ghostscript
for the PDF to image processing step. These should be pulled automatically, but you may need to manually install.
On linux use:
Usage
With file URL
From local path
Params
filePath
string
File URL or local path
cleanup
boolean (optional)
Clear images from tmp after run. Defaults to true
concurrency
number (optional)
Number of pages to run at a time. Defaults to 10
errorMode
enum (optional)
Supported enums:
THROW
IGNORE
Defaults to IGNORE
extractOnly
boolean (optional)
Set to true to only extract structured data using a schema. Defaults to false
extractPerPage
string[] (optional)
Extract data per page instead of the entire document
imageDensity
number (optional)
DPI for image conversion. Defaults to 300
imageHeight
number (optional)
Maximum height for converted images. Defaults to 2048
llmParams
object (optional)
Additional parameters to pass to the LLM
maintainFormat
boolean (optional)
Slower but helps maintain consistent formatting. Defaults to false
maxRetries
number (optional)
Number of retries to attempt on a failed page. Defaults to 1
maxTesseractWorkers
number (optional)
Maximum number of Tesseract workers. Defaults to -1
modelProvider
enum (optional)
Supported enums:
OPENAI
BEDROCK
GOOGLE
AZURE
Defaults to OPENAI
customModelFunction
function (optional)
Runs custom model function
outputDir
string (optional)
Save combined result.md to a file
pagesToConvertAsImages
number[] (optional)
Page numbers to convert to image as array
schema
object (optional)
Schema for structured data extraction
tempDir
string (optional)
Directory to use for temporary files. Defaults to /os/tmp
trimEdges
boolean (optional)
Trims pixels from all edges that contain values similar to the given background color, which defaults to that of the top-left pixel. Defaults to true
The maintainFormat
option tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. This requires the requests to run synchronously, so it's a lot slower. But valuable if your documents have a lot of tabular data, or frequently have tables that cross pages.
Supported Models
Zerox supports a wide range of models across different providers:
Azure
gpt-4o
gpt-4o-mini
OpenAI
gpt-4o
gpt-4o-mini
AWS Bedrock
anthropic.claude-3-5-haiku-20241022-v1:0
anthropic.claude-3-5-sonnet-20240620-v1:0
anthropic.claude-3-5-sonnet-20241022-v2:0
anthropic.claude-3-haiku-20240307-v1:0
anthropic.claude-3-opus-20240229-v1:0
anthropic.claude-3-sonnet-20240229-v1:0
gemini-1.5-flash
gemini-1.5-flash-8b
gemini-1.5-pro
gemini-2.0-flash-001
gemini-2.0-flash-lite-preview-02-05
Data Extraction
Zerox supports structured data extraction from documents using a schema. This allows you to pull specific information from documents in a structured format instead of getting the full markdown conversion.
Set extractOnly: true
and provide a schema
to extract structured data. The schema follows the JSON Schema standard.
Use extractPerPage
to extract data per page instead of from the whole document at once.
You can also set extractionModel
, extractionModelProvider
, and extractionCredentials
to use a different model for extraction than OCR. By default, the same model is used.
Response
Example
Last updated