Deduplication

The Entity Deduplication API enables you to:

  1. Index your entire entity catalog (e.g., from ElasticSearch or other sources).

  2. Identify entities that are likely to be duplicates, based on various textual fields.

  3. Maintain this deduplication index through incremental (day-to-day) updates.

High-Level Flow

  1. Create an Index

    1. Create or update an index configuration in the Omni system. This endpoint defines how entities are stored, what fields are indexed, and which index name you should pass to other endpoints (e.g., /v1/entity, /v1/duplicates) to run deduplication queries against.

  2. Initial Indexing

    1. Use the /v1/entity/batch endpoint to send your entire catalog in batches. Once the initial indexing completes (~1h / 500k entities), duplicate detection results will be available.

  3. Duplicate Retrieval

    1. After entities are indexed, duplicates can be retrieved in one of two ways:

      1. Automatically appended to the entity record if you send a single or batch upsert request.

      2. Via the /v1/duplicates endpoint to fetch duplicates in bulk or by SKU.

  4. Incremental Updates

    1. For day-to-day changes, send new (or updated) entities using the /v1/entity or /v1/entity/batch endpoints. Each updated entity will return a list of possible duplicates.

Last updated