2025-04-17
Extraction Citation

- Improves confidence in extraction accuracy
- Enables better debugging and fine-tuning of your extraction schemas
- Citation coordinates are available in API response as well
Long Document Processing

- Enhanced processing for extra-long receipts and wide documents
- Maintained high accuracy across unusual aspect ratios and complex layouts
- Smart handling of multi-page documents with continuous content flowing across pages
API
API
We’ve added 3 new configuration options to
/extract
API endpoint:includeConfidence
: Get detailed confidence scoring for each extracted fieldpageRange
: Process only specific pages of long documentsenableHybridExtraction
: Combine rule-based and AI-based extraction
Improvements
Improvements
- Added loading indicators and multi-document extractions in the schema builder
- Added a new parameter in /extract endpoint: enableHybridExtraction
- Enabled exporting OCR result to Microsoft Word (.docx)
- Made the sidebar collapsible
- Fixed nested table rendering
2025-03-13
Confidence intervals

- We return confidence scores for all extracted values and each page of the OCR result.
- From the API, scores are returned in a separate confidence object. You can read more on the format in the extract documentation.
Improved schema builder

- Running
Suggest schema
will analyze all of the example files and attempt to build a generalized schema that covers the most common use cases. - Between this and confidence intervals, it’s now way easier to test a variety of different document formats to maximize accuracy.
Better observability

- A document preview
- OCR + Extracted values + Confidence scores
- The full JSON result
Plus a few more improvements:
- We did a revamp on our API documentation. Much cleaner now!
- Added the Mistral model to our OCR Benchmark.
- Improved observability and retry for web-hooks.