Changelog
2025-04-17
Extraction Citation
Today, we’re launching Extraction Citation, which allows you to trace any extracted field back to its location in the original document.
- Improves confidence in extraction accuracy
- Enables better debugging and fine-tuning of your extraction schemas
- Citation coordinates are available in API response as well
Long Document Processing
We’ve added support for handling documents of any size or orientation, particularly ultra-long or wide formats.
- Enhanced processing for extra-long receipts and wide documents
- Maintained high accuracy across unusual aspect ratios and complex layouts
- Smart handling of multi-page documents with continuous content flowing across pages
2025-03-13
Confidence intervals
Confidence intervals have been a major work in progress for us over the last couple months and I’m excited to announce it’s finally live!
- We return confidence scores for all extracted values and each page of the OCR result.
- From the API, scores are returned in a separate confidence object. You can read more on the format in the extract documentation.
Improved schema builder
You can now upload multiple files to build and evaluate each template.
- Running
Suggest schema
will analyze all of the example files and attempt to build a generalized schema that covers the most common use cases. - Between this and confidence intervals, it’s now way easier to test a variety of different document formats to maximize accuracy.
Better observability
We made some big updates to our Usage view! You can now click into each API request to view and download:
- A document preview
- OCR + Extracted values + Confidence scores
- The full JSON result
Plus a few more improvements:
- We did a revamp on our API documentation. Much cleaner now!
- Added the Mistral model to our OCR Benchmark.
- Improved observability and retry for web-hooks.