A streamlined document processing library that leverages AI vision models for OCR and data extraction. Designed to handle complex document layouts including tables, charts, and irregular formatting.

  • Universal Document Support: Process PDFs, DOCX, images, and more
  • Vision-Based Processing: Uses AI vision models for superior layout understanding
  • Structured Output: Converts documents to clean Markdown or extracts specific data
  • Multi-Platform: Available for both Node.js and Python

Getting Started

Zerox is available as both a Node.js and Python package.

How It Works

  1. Upload any supported document
  2. Document is automatically converted into a series of images
  3. Vision models process each page
  4. Returns clean, structured output in Markdown or extracted data

Try out the hosted version here: https://getomni.ai/ocr-demo.

SDK Feature Support

Here’s the SDK feature support table in markdown format:

FeatureNode.jsPython
PDF Processing✓ (requires graphicsmagick)✓ (requires poppler)
Image Processing
OpenAI Support
Azure OpenAI Support
AWS Bedrock Support
Google Gemini Support
Vertex AI Support
Data Extraction✓ ( schema )
Per-page Extraction✓ ( extractPerPage )
Custom System Prompts✓ ( custom_system_prompt )
Maintain Format Option✓ ( maintainFormat )✓ ( maintain_format )
Async API
Error Handling Modes✓ ( errorMode )
Concurrent Processing✓ ( concurrency )✓ ( concurrency )
Temp Directory Management✓ ( tempDir )✓ ( temp_dir )
Page Selection✓ ( pagesToConvertAsImages )✓ ( select_pages )
Orientation Correction✓ ( correctOrientation )
Edge Trimming✓ ( trimEdges )