Run Extract
Extract structured data from a document asynchronously
This is an asynchronous API endpoint. The initial request returns a jobId
and status
. You can use the jobId
to check the processing status and fetch results.
When using templates, you can provide a templateId
to load predefined
configurations. Any configuration parameters (schema, extractPerPage, etc.)
explicitly specified in the API request will override the corresponding
template settings.
Body Parameters
Either file or URL is required but not both. See Accepted File Types.
URL of the document to extract data from.
The file to extract data from. Use multipart/form-data
as the Content-Type
header.
The template ID used for extraction.
JSON schema to define the structure of extracted data. See JSON schema examples.
Whether to exclude OCR result from the response. Defaults to false.
Whether to maintain format from the previous page. Defaults to false.
Array of page numbers to process. Defaults to all pages.
Array of schema properties to extract per page. Defaults to empty array.
Whether to bypass the cache and process the document from scratch. Defaults to false.
Whether to extract directly from document images. Defaults to false.
Whether to include confidence intervals. Defaults to false.
Unique identifier for the webhook callback
Custom JSON data to be included in the response
Example JSON Schema
This is a JSON Schema, which defines the structure and validation rules for the JSON. For more examples and details, see JSON Schema Examples.
Response
Unique identifier for the extraction request
Status of the extraction (success, processing, or error)
URL for polling the extraction result