Tailoring entity resolution for matching product offers

  • Authors:
  • Hanna Köpcke;Andreas Thor;Stefan Thomas;Erhard Rahm

  • Affiliations:
  • Web Data Integration Lab, Leipzig, Germany;Web Data Integration Lab, Leipzig, Germany;Web Data Integration Lab, Leipzig, Germany;Web Data Integration Lab, Leipzig, Germany

  • Venue:
  • Proceedings of the 15th International Conference on Extending Database Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Product matching is a challenging variation of entity resolution to identify representations and offers referring to the same product. Product matching is highly difficult due to the broad spectrum of products, many similar but different products, frequently missing or wrong values, and the textual nature of product titles and descriptions. We propose the use of tailored approaches for product matching based on a preprocessing of product offers to extract and clean new attributes usable for matching. In particular, we propose a new approach to extract and use so-called product codes to identify products and distinguish them from similar product variations. We evaluate the effectiveness of the proposed approaches with challenging real-life datasets with product offers from online shops. We also show that the UPC information in product offers is often error-prone and can lead to insufficient match decisions.