Get Started
Overview
A streamlined document processing library that leverages AI vision models for OCR and data extraction. Designed to handle complex document layouts including tables, charts, and irregular formatting.
- Universal Document Support: Process PDFs, DOCX, images, and more
- Vision-Based Processing: Uses AI vision models for superior layout understanding
- Structured Output: Converts documents to clean Markdown or extracts specific data
- Multi-Platform: Available for both Node.js and Python
Getting Started
Zerox is available as both a Node.js and Python package.
How It Works
- Upload any supported document
- Document is automatically converted into a series of images
- Vision models process each page
- Returns clean, structured output in Markdown or extracted data
Try out the hosted version here: https://getomni.ai/ocr-demo.
SDK Feature Support
Here’s the SDK feature support table in markdown format:
Feature | Node.js | Python |
---|---|---|
PDF Processing | ✓ (requires graphicsmagick) | ✓ (requires poppler) |
Image Processing | ✓ | ✓ |
OpenAI Support | ✓ | ✓ |
Azure OpenAI Support | ✓ | ✓ |
AWS Bedrock Support | ✓ | ✓ |
Google Gemini Support | ✓ | ✓ |
Vertex AI Support | ✗ | ✓ |
Data Extraction | ✓ ( schema ) | ✗ |
Per-page Extraction | ✓ ( extractPerPage ) | ✗ |
Custom System Prompts | ✗ | ✓ ( custom_system_prompt ) |
Maintain Format Option | ✓ ( maintainFormat ) | ✓ ( maintain_format ) |
Async API | ✓ | ✓ |
Error Handling Modes | ✓ ( errorMode ) | ✗ |
Concurrent Processing | ✓ ( concurrency ) | ✓ ( concurrency ) |
Temp Directory Management | ✓ ( tempDir ) | ✓ ( temp_dir ) |
Page Selection | ✓ ( pagesToConvertAsImages ) | ✓ ( select_pages ) |
Orientation Correction | ✓ ( correctOrientation ) | ✗ |
Edge Trimming | ✓ ( trimEdges ) | ✗ |