Quickstart
A streamlined document processing library that leverages AI vision models for OCR and data extraction. Designed to handle complex document layouts including tables, charts, and irregular formatting.
Universal Document Support: Process PDFs, DOCX, images, and more
Vision-Based Processing: Uses AI vision models for superior layout understanding
Structured Output: Converts documents to clean Markdown or extracts specific data
Multi-Platform: Available for both Node.js and Python
How It Works
Upload any supported document
Document is automatically converted into a series of images
Vision models process each page
Returns clean, structured output in Markdown or extracted data
Try out the hosted version here: https://getomni.ai/ocr-demo.
Getting Started
Zerox is available as both a Node.js and Python package.
Node.js README - npm package
Python README - pip package
SDK Feature Support
PDF Processing
✓ (requires graphicsmagick)
✓ (requires poppler)
Image Processing
✓
✓
OpenAI Support
✓
✓
Azure OpenAI Support
✓
✓
AWS Bedrock Support
✓
✓
Google Gemini Support
✓
✓
Vertex AI Support
✗
✓
Data Extraction
✓ (schema
)
✗
Per-page Extraction
✓ (extractPerPage
)
✗
Custom System Prompts
✗
✓ (custom_system_prompt
)
Maintain Format Option
✓ (maintainFormat
)
✓ (maintain_format
)
Async API
✓
✓
Error Handling Modes
✓ (errorMode
)
✗
Concurrent Processing
✓ (concurrency
)
✓ (concurrency
)
Temp Directory Management
✓ (tempDir
)
✓ (temp_dir
)
Page Selection
✓ (pagesToConvertAsImages
)
✓ (select_pages
)
Orientation Correction
✓ (correctOrientation
)
✗
Edge Trimming
✓ (trimEdges
)
✗
Last updated