Text mining for product attribute extraction

  • Authors:
  • Rayid Ghani;Katharina Probst;Yan Liu;Marko Krema;Andrew Fano

  • Affiliations:
  • Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL;Carnegie Mellon University, Pittsburgh, PA;Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL

  • Venue:
  • ACM SIGKDD Explorations Newsletter
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe our work on extracting attribute and value pairs from textual product descriptions. The goal is to augment databases of products by representing each product as a set of attribute-value pairs. Such a representation is beneficial for tasks where treating the product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include demand forecasting, assortment optimization, product recommendations, and assortment comparison across retailers and manufacturers. We deal with both implicit and explicit attributes and formulate both kinds of extractions as classification problems. Using single-view and multi-view semi-supervised learning algorithms, we are able to exploit large amounts of unlabeled data present in this domain while reducing the need for initial labeled data that is expensive to obtain. We present promising results on apparel and sporting goods products and show that our system can accurately extract attribute-value pairs from product descriptions. We describe a variety of application that are built on top of the results obtained by the attribute extraction system.