This tutorial will walk you through the process of extracting dense tables from account statements using OmniAI’s extract API with the extractPerPage parameter.

Parse dense account statements

Multi-page account statements often contain transaction tables with similar formatting on each page. This tutorial shows how to efficiently extract these tables using the extractPerPage parameter, which processes each page separately and then aggregates the results.

Account statement with dense table

Example account statement with dense table

1

Create a document template

Create a document template by uploading an example of your account statement or using our pre-built account statement template.

Bank account statement template
2

Build an extraction schema

For account statements, we’ll extract:

  • account_info object - Account holder details and statement period
  • summary object - Account summary including opening/closing balances
  • transactions array - Transaction details

Here’s our schema structure focusing on the transaction table:

3

Configure per-page extraction

The key to efficiently extracting dense tables that span multiple pages is to use the extractPerPage parameter to process the transactions array on each page separately.

For account statements with transaction tables spanning multiple pages:

  1. In your template settings, enable the Extract Per Page option for the transactions property
Extract Per Page Configuration

This tells OmniAI to:

  1. Extract the account_info and summary data once from the whole document
  2. Extract transactions from each page separately
  3. Automatically combine all transactions into a single array in the final result
4

Test the template

Run the template on an example statement to ensure it’s working as expected. You should see all transactions from all pages combined into a single array.

Dense table extraction results
5

Using the API

Once you’ve saved the template and confirmed it’s working as expected, you can use the API to parse account statements.

You can retrieve your API keys from the settings page.

Node.js
import axios from 'axios';
import dotenv from 'dotenv';

dotenv.config();

// Replace with your own values
const url = '<file-url>';
const templateId = '<template-id>';

const OMNI_API_URL = 'https://api.getomni.ai';
const options = {
  headers: {
    'x-api-key': process.env.OMNI_API_KEY,
    'Content-Type': 'application/json',
  },
};

const createExtraction = async (url, templateId) => {
  const data = {
    url,
    templateId,
  };
  const response = await axios.post(`${OMNI_API_URL}/extract`, data, options);
  return response.data;
};

const getExtractionResult = async (extractionId) => {
  const response = await axios.get(`${OMNI_API_URL}/extract?jobId=${extractionId}`, options);
  return response.data;
};

const runExtraction = async (url, templateId) => {
  try {
    // Step 1: Initiate the extraction
    const extractionData = await createExtraction(url, templateId);

    // Step 2: Get the extraction jobId
    const extractionId = extractionData.jobId;

    // Step 3: Poll for results with a maximum duration
    let result;
    const pollingInterval = 2000; // 2 seconds between attempts
    const maxDurationMs = 1800000; // 3 minutes maximum polling duration
    const startTime = Date.now();

    while (Date.now() - startTime < maxDurationMs) {
      // Wait for the specified polling interval
      await new Promise((resolve) => setTimeout(resolve, pollingInterval));
      result = await getExtractionResult(extractionId);

      if (result.status === 'COMPLETE' || result.status === 'ERROR') {
        break;
      }
    }

    return result;
  } catch (error) {
    throw error;
  }
};

runExtraction(url, templateId)
  .then((result) => console.log(JSON.stringify(result, null, 2)))
  .catch((error) => console.error(error));

Optimization tips for extracting dense tables

  1. Structure your schema carefully: Define the transaction array’s schema with all the fields you need to extract from the table.

  2. Use descriptive field descriptions: Clear descriptions help the extraction engine understand what data to look for.

  3. Test with multiple documents: Test your template with different account statements to ensure it works across various formats.

  4. Check extraction consistency: Verify that transactions from different pages are formatted consistently in the final result.

  5. Performance benefits: For large documents with many pages, extracting tables per page can be significantly faster and more accurate than extracting from the entire document at once.