Classify documents
How to classify documents and route them to the right extraction templates
This tutorial will walk you through the process of classifying different document types and using the appropriate extraction templates based on the classification.
Automated Document Classification Workflow
In many business scenarios, you’ll receive various document types through a single channel (like an upload portal) and need to process them differently. Document classification is the crucial first step that determines which extraction template to apply.
Use Case: Financial Document Processing
Let’s explore a common use case: an accounting firm that processes various financial documents for clients:
- Invoices from vendors that need to be recorded and paid
- Receipts for expense reimbursements and tax documentation
- Bank statements showing account activity
- Purchase orders documenting approved purchases
Each document type requires different data extraction templates. Classification allows you to automatically route documents to the appropriate extraction process.
Create document extraction templates
Before classification, create extraction templates for each document type you handle.
For our use case, you would create templates for:
- Invoice extraction template
- Receipt extraction template
- Bank statement extraction template
- Purchase order extraction template
See the basic invoice tutorial and dense table tutorial for examples of creating extraction templates.
Create a classification object
Use the /classification endpoint to create a classification object that can identify your document types.
The response will include a id
that you’ll use to classify documents:
Run classification on a document
Route document to appropriate template
Based on the classification result, route the document to the appropriate extraction template.
🎉 You’ve now successfully classified a document and routed it to the appropriate extraction template!
Classification Performance Tips
-
Use diverse training samples: When creating a classification object, include examples of each document type with different layouts, formatting, and quality levels.
-
Include an
OTHER
option: Create anOTHER
option to handle documents that don’t match any of your predefined types. This helps prevent important documents from being misclassified. -
Be specific with descriptions: Write detailed descriptions for each document type that highlight unique identifying features (e.g., “Invoice: Contains line items with quantities and prices”).
-
Update your options regularly: Add new document variations to your classification object as you encounter them to continuously improve accuracy.