This tutorial will walk you through the process of classifying different document types and using the appropriate extraction templates based on the classification.

Automated Document Classification Workflow

In many business scenarios, you’ll receive various document types through a single channel (like an upload portal) and need to process them differently. Document classification is the crucial first step that determines which extraction template to apply.

Use Case: Financial Document Processing

Let’s explore a common use case: an accounting firm that processes various financial documents for clients:

  • Invoices from vendors that need to be recorded and paid
  • Receipts for expense reimbursements and tax documentation
  • Bank statements showing account activity
  • Purchase orders documenting approved purchases

Each document type requires different data extraction templates. Classification allows you to automatically route documents to the appropriate extraction process.

1

Create document extraction templates

Before classification, create extraction templates for each document type you handle.

For our use case, you would create templates for:

  • Invoice extraction template
  • Receipt extraction template
  • Bank statement extraction template
  • Purchase order extraction template

See the basic invoice tutorial and dense table tutorial for examples of creating extraction templates.

2

Create a classification object

Use the /classification endpoint to create a classification object that can identify your document types.

const options = {
  headers: {
    'x-api-key': process.env.OMNI_API_KEY,
    'Content-Type': 'application/json',
  }
}

const data = {
  name: "Financial Document Classifier",
  options: [
    { name: "INVOICE", description: "Document listing items sold with prices" },
    { name: "RECEIPT", description: "Proof of payment with transaction details" },
    { name: "BANK_STATEMENT", description: "Monthly record of account activity" },
    { name: "PURCHASE_ORDER", description: "Document requesting products with agreed terms" },
  ]
};

axios
  .post(`https://api.getomni.ai/classification`, data, options)
  .then((response) => { console.log(response.data) })
  .catch((error) => { console.error(error) });

The response will include a id that you’ll use to classify documents:

{
  "id": "550e8400-e29b-41d4-a716-446655440000"
  "success": true
}
3

Run classification on a document

Use the /classify endpoint with your classification ID to identify document types.

const data = {
  url: "<URL of the document to classify>",
  classificationId: "<Classification ID>"
}

// Classify a document
axios.post(`https://api.getomni.ai/classify`, data, options)
  .then((response) => { console.log(response.data) })
  .catch((error) => { console.error(error) });

Example response:

{
  "result": {
    "id": "4e3598e3-3ee2-423b-820e-176f4547ac4b",
    "name": "BANK_STATEMENT"
  }
}
4

Route document to appropriate template

Based on the classification result, route the document to the appropriate extraction template.

// Map document types to template IDs
const templateMap = {
  'INVOICE': 'uuid_of_invoice_template',
  'RECEIPT': 'uuid_of_receipt_template',
  'BANK_STATEMENT': 'uuid_of_bank_statement_template',
  'PURCHASE_ORDER': 'uuid_of_purchase_order_template'
};

const templateId = templateMap[documentType];

const data = {
  url: "<URL of the document to extract>",
  templateId: templateId
}

// Extract data using the appropriate template
axios.post(`https://api.getomni.ai/extract`, data, options)
  .then((response) => { console.log(response.data) })
  .catch((error) => { console.error(error) });

🎉 You’ve now successfully classified a document and routed it to the appropriate extraction template!

Classification Performance Tips

  1. Use diverse training samples: When creating a classification object, include examples of each document type with different layouts, formatting, and quality levels.

  2. Include an OTHER option: Create an OTHER option to handle documents that don’t match any of your predefined types. This helps prevent important documents from being misclassified.

  3. Be specific with descriptions: Write detailed descriptions for each document type that highlight unique identifying features (e.g., “Invoice: Contains line items with quantities and prices”).

  4. Update your options regularly: Add new document variations to your classification object as you encounter them to continuously improve accuracy.