2025-04-17

Extraction Citation

Today, we’re launching Extraction Citation, which allows you to trace any extracted field back to its location in the original document.

  • Improves confidence in extraction accuracy
  • Enables better debugging and fine-tuning of your extraction schemas
  • Citation coordinates are available in API response as well

Long Document Processing

We’ve added support for handling documents of any size or orientation, particularly ultra-long or wide formats.

  • Enhanced processing for extra-long receipts and wide documents
  • Maintained high accuracy across unusual aspect ratios and complex layouts
  • Smart handling of multi-page documents with continuous content flowing across pages

2025-03-13

Confidence intervals

Confidence intervals have been a major work in progress for us over the last couple months and I’m excited to announce it’s finally live!

  • We return confidence scores for all extracted values and each page of the OCR result.
  • From the API, scores are returned in a separate confidence object. You can read more on the format in the extract documentation.

Improved schema builder

You can now upload multiple files to build and evaluate each template.

  • Running Suggest schema will analyze all of the example files and attempt to build a generalized schema that covers the most common use cases.
  • Between this and confidence intervals, it’s now way easier to test a variety of different document formats to maximize accuracy.

Better observability

We made some big updates to our Usage view! You can now click into each API request to view and download:

  • A document preview
  • OCR + Extracted values + Confidence scores
  • The full JSON result

Plus a few more improvements: