A streamlined document processing library that leverages AI vision models for OCR and data extraction. Designed to handle complex document layouts including tables, charts, and irregular formatting.
Universal Document Support: Process PDFs, DOCX, images, and more
Vision-Based Processing: Uses AI vision models for superior layout understanding
Structured Output: Converts documents to clean Markdown or extracts specific data
Multi-Platform: Available for both Node.js and Python
How It Works
Upload any supported document
Document is automatically converted into a series of images
Vision models process each page
Returns clean, structured output in Markdown or extracted data
Try out the hosted version here: https://getomni.ai/ocr-demo.
Getting Started
Zerox is available as both a Node.js and Python package.
Node.js README - npm package
Python README - pip package
SDK Feature Support
PDF Processing
✓ (requires graphicsmagick)
✓ (requires poppler)
Image Processing
OpenAI Support
Azure OpenAI Support
AWS Bedrock Support
Google Gemini Support
Vertex AI Support
Data Extraction
✓ (schema
Per-page Extraction
✓ (extractPerPage
Custom System Prompts
✓ (custom_system_prompt
Maintain Format Option
✓ (maintainFormat
✓ (maintain_format
Async API
Error Handling Modes
✓ (errorMode
Concurrent Processing
✓ (concurrency
✓ (concurrency
Temp Directory Management
✓ (tempDir
✓ (temp_dir
Page Selection
✓ (pagesToConvertAsImages
✓ (select_pages
Orientation Correction
✓ (correctOrientation
Edge Trimming
✓ (trimEdges
Last updated