Semi-supervised learning of attribute-value pairs from product descriptions

  • Authors:
  • Katharina Probst;Rayid Ghani;Marko Krema;Andrew Fano;Yan Liu

  • Affiliations:
  • Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product databases. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include product recommendations, product comparison, and demand forecasting. We formulate the extraction as a classification problem and use a semi-supervised algorithm (co-EM) along with (Naïve Bayes). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the supervised and semi-supervised classification algorithms. Finally, the extracted attributes and values are linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods.