Overview

A streamlined document processing library that leverages AI vision models for OCR and data extraction. Designed to handle complex document layouts including tables, charts, and irregular formatting.

Universal Document Support: Process PDFs, DOCX, images, and more
Vision-Based Processing: Uses AI vision models for superior layout understanding
Structured Output: Converts documents to clean Markdown or extracts specific data
Multi-Platform: Available for both Node.js and Python

Getting Started

Zerox is available as both a Node.js and Python package.

Node.js

Python

How It Works

Upload any supported document
Document is automatically converted into a series of images
Vision models process each page
Returns clean, structured output in Markdown or extracted data

Try out the hosted version here: https://getomni.ai/ocr-demo.

SDK Feature Support

Here’s the SDK feature support table in markdown format:

Feature	Node.js	Python
PDF Processing	✓ (requires graphicsmagick)	✓ (requires poppler)
Image Processing	✓	✓
OpenAI Support	✓	✓
Azure OpenAI Support	✓	✓
AWS Bedrock Support	✓	✓
Google Gemini Support	✓	✓
Vertex AI Support	✗	✓
Data Extraction	✓ ( schema )	✗
Per-page Extraction	✓ ( extractPerPage )	✗
Custom System Prompts	✗	✓ ( custom_system_prompt )
Maintain Format Option	✓ ( maintainFormat )	✓ ( maintain_format )
Async API	✓	✓
Error Handling Modes	✓ ( errorMode )	✗
Concurrent Processing	✓ ( concurrency )	✓ ( concurrency )
Temp Directory Management	✓ ( tempDir )	✓ ( temp_dir )
Page Selection	✓ ( pagesToConvertAsImages )	✓ ( select_pages )
Orientation Correction	✓ ( correctOrientation )	✗
Edge Trimming	✓ ( trimEdges )	✗
Direct Image Extraction	✓ ( directImageExtraction )	✗

Get Started

SDKs

Getting Started

Node.js

Python

How It Works

SDK Feature Support

Get Started

SDKs

​Getting Started

Node.js

Python

​How It Works

​SDK Feature Support

Getting Started

How It Works

SDK Feature Support