Quickstart

A streamlined document processing library that leverages AI vision models for OCR and data extraction. Designed to handle complex document layouts including tables, charts, and irregular formatting.

  • Universal Document Support: Process PDFs, DOCX, images, and more

  • Vision-Based Processing: Uses AI vision models for superior layout understanding

  • Structured Output: Converts documents to clean Markdown or extracts specific data

  • Multi-Platform: Available for both Node.js and Python

How It Works

  1. Upload any supported document

  2. Document is automatically converted into a series of images

  3. Vision models process each page

  4. Returns clean, structured output in Markdown or extracted data

Try out the hosted version here: https://getomni.ai/ocr-demo.

Getting Started

Zerox is available as both a Node.js and Python package.

SDK Feature Support

Feature
Node.js
Python

PDF Processing

✓ (requires graphicsmagick)

✓ (requires poppler)

Image Processing

OpenAI Support

Azure OpenAI Support

AWS Bedrock Support

Google Gemini Support

Vertex AI Support

Data Extraction

✓ (schema)

Per-page Extraction

✓ (extractPerPage)

Custom System Prompts

✓ (custom_system_prompt)

Maintain Format Option

✓ (maintainFormat)

✓ (maintain_format)

Async API

Error Handling Modes

✓ (errorMode)

Concurrent Processing

✓ (concurrency)

✓ (concurrency)

Temp Directory Management

✓ (tempDir)

✓ (temp_dir)

Page Selection

✓ (pagesToConvertAsImages)

✓ (select_pages)

Orientation Correction

✓ (correctOrientation)

Edge Trimming

✓ (trimEdges)

Last updated