Matching product titles using web-based enrichment

  • Authors:
  • Vishrawas Gopalakrishnan;Suresh Parthasarathy Iyengar;Amit Madaan;Rajeev Rastogi;Srinivasan Sengamedu

  • Affiliations:
  • Suny Buffalo, Buffalo, NY, USA;Yahoo Labs, Bangalore, India;TheFind, Inc, Mountain View, CA, USA;Amazon, Bangalore, India;Komli Labs, Bangalore, India

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Matching product titles from different data feeds that refer to the same underlying product entity is a key problem in online shopping. This matching problem is challenging because titles across the feeds have diverse representations with some missing important keywords like brand and others containing extraneous keywords related to product specifications. In this paper, we propose a novel unsupervised matching algorithm that leverages web earch engines to (1) enrich product titles by adding important missing tokens that occur frequently in search results, and (2) compute importance scores for tokens based on their ability to retrieve other (enriched title) tokens in search results. Our matching scheme calculates the Cosine similarity between enriched title pairs with tokens weighted by their importance scores. We propose an optimization that exploits the templatized structure of product titles to reduce the number of search queries. In experiments with real-life shopping datasets, we found that our matching algorithm has superior F1 scores compared to IDF-based cosine similarity.