Data Deduplication

Our workforce lets you quickly deduplicate your data. While algorithms are good at identifying likely duplicates, a human eye can quickly confirm or dismiss duplicates suggested by your algorithm. From matching product SKUs across multiple vendors to reconciling small business locations, our workforce reviews and provides definitive decisions about duplicate candidates.

  • Common Data Deduplication Jobs
  • E-Commerce Product Data - Our workforce deduplicates product catalogs for retailers and marketplaces. When multiple vendors submit product information to a marketplace or retailer, duplication is an issue. Our workforce will review products side by side and compare their descriptions, attributes, images, prices and any other information you have that helps identify duplicates.

  • Business Information – Business listings require frequent deduplication. Because companies acquire business data from so many sources, deduplication is often required. Our workforce looks at company information including address, phone number and website to identify true duplicates.

  • User Contributed Data - Our workforce deduplicates any dataset built from user contributions. From reviewing Q&As about a certain topic to top city attractions, our workforce helps you keep your information clean and duplicate free.

How it Works

Our workers are presented with pairs or clusters of duplicate candidates along with all data points you have about each item. With duplicate pairs, our workers provide a simple “duplicate” or “not duplicate” response after comparing all available information. In the case of clusters, our workers group all items in the cluster into groups of matching items. Our workforce of qualified data evaluators can review more than 500,000 pairs or groups per day. You can send us the data you need deduplicated in batch files or in real time through our application programming interface.

Advantages Over Traditional Deduplication


By combining intelligent software with human cognition, you get benefits of subjective analysis and algorithmic quality control. By verifying results with redundant tasks and Gold Standard Tasks (aka "known answers"), quality is monitored and measured in real time.


With a workforce of more than 500,000 skilled workers available to complete data tasks, jobs that used to take months are completed in hours.


We offer a fully elastic model, letting you scale up and down on demand.

Lower Cost

Our software platform allows workers to move extremely quickly, helping keep labor costs low. Our solutions have been shown to achieve higher quality results at 50% of the cost of traditional methods.

  • Duplicate Rates of 10-30% Are Not Uncommon
  • Peter Harvey, CEO of the marketing analytics firm Intellidyn, says that when his firm audits recently “cleaned” customer files from clients, it found that 5% of the files contained duplicate records. The duplication rate for untouched customer files can be 20% or more.
  • Guaranteed Accuracy With QualitySmart™ Workflows
  • We leverage our patent-pending QualitySmart system to ensure the deduplication work we produce is accurate. Our smart digital assembly line routes tasks to multiple workers to evaluate the same pair or cluster of possible duplicates. QualitySmart optimizes the number of workers who participate in each task to ensure we meet your required accuracy level.
  • API and Pricing
  • Our platform features an API that enables real-time submissions and hands-off management of data deduplication projects for enterprise-level clients. We would like to learn more about your unique deduplication project, so call us today.
View demo.