The Entity Deduplication API enables you to:
Index your entire entity catalog (e.g., from ElasticSearch or other sources).
Identify entities that are likely to be duplicates, based on various textual fields.
Maintain this deduplication index through incremental (day-to-day) updates.
High-Level Flow
Create an Index
Create or update an index configuration in the Omni system. This endpoint defines how entities are stored, what fields are indexed, and which index name you should pass to other endpoints (e.g.,
/v1/entity
,/v1/duplicates
) to run deduplication queries against.
Initial Indexing
Use the
/v1/entity/batch
endpoint to send your entire catalog in batches. Once the initial indexing completes (~1h / 500k entities), duplicate detection results will be available.
Duplicate Retrieval
After entities are indexed, duplicates can be retrieved in one of two ways:
Automatically appended to the entity record if you send a single or batch upsert request.
Via the
/v1/duplicates
endpoint to fetch duplicates in bulk or by SKU.
Incremental Updates
For day-to-day changes, send new (or updated) entities using the
/v1/entity
or/v1/entity/batch
endpoints. Each updated entity will return a list of possible duplicates.
Last updated