JSON Schema
A guide to JSON Schema
What is an JSON Schema?
JSON Schema is a declarative language that allows you to validate, document, and define the structure of JSON data, specifying required fields, data types, and constraints to ensure the data conforms to a specific format.
Basic Concepts
Fields
Fields are the basic building blocks of your schema. Each field represents a piece of information you want to extract.
Field Types
Each field must have a type. The available types are:
string
: For text values (names, descriptions, addresses)number
: For numerical values (amounts, counts, measurements)boolean
: For yes/no or true/false valuesenum
: For values from a predefined list of optionsobject
: For grouping related fields togetherarray
: For lists of items
Return As List
return_as_list
is a feature that allows you to return the data as a list of objects or a list of strings/numbers. This is useful for extracting data from tables or lists.
Extract Per Page
extract_per_page
is a feature that allows you to extract the defined schema per page. This is useful for extracting data from multi-page documents that have the same structure on each page.
Examples
Basic Schema
Extract a string value.
Enum Schema
Extract status based on a list of options.
Object Schema
Extract an address object.
List Schema
Extract a list of company names.
Table Schema
Extract a transaction table from account statements.