Semi-supervised learning of attribute-value pairs from product descriptions

Authors:
Katharina Probst;Rayid Ghani;Marko Krema;Andrew Fano;Yan Liu
Affiliations:
Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL;Carnegie Mellon University, Pittsburgh, PA
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 5
Cited 19

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Opinion observer: analyzing and comparing opinions on the Web

WWW '05 Proceedings of the 14th international conference on World Wide Web
Extracting product features and opinions from reviews

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

An unsupervised framework for extracting and normalizing product attributes from multiple web sites

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Towards `Interactive' Active Learning in Multi-view Feature Sets for Information Extraction

ECML '07 Proceedings of the 18th European conference on Machine Learning
Using structured text for large-scale attribute extraction

Proceedings of the 17th ACM conference on Information and knowledge management
An unsupervised method for joint information extraction and feature mining across different Web sites

Data & Knowledge Engineering
Low-Cost Supervision for Multiple-Source Attribute Extraction

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
An Unsupervised Approach to Product Attribute Extraction

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Towards a universal marketplace over the web: statistical multi-label classification of service provider forms with simulated annealing

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Latent variable models of concept-attribute attachment

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Generation of Specifications Forms through Statistical Learning for a Universal Services Marketplace

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Semantic annotation of biosystematics literature without training examples

Journal of the American Society for Information Science and Technology
A methodology to learn ontological attributes from the Web

Data & Knowledge Engineering
Acquisition of instance attributes via labeled and related instances

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Experiments in graph-based semi-supervised learning methods for class-instance acquisition

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SCAD: collective discovery of attribute values

Proceedings of the 20th international conference on World wide web
The role of query sessions in extracting instance attributes from web search queries

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Data extraction from web pages based on structural-semantic entropy

Proceedings of the 21st international conference companion on World Wide Web
CharaParser for fine-grained semantic annotation of organism morphological descriptions

Journal of the American Society for Information Science and Technology
RevMiner: an extractive interface for navigating reviews on a smartphone

Proceedings of the 25th annual ACM symposium on User interface software and technology
Automated faceted reporting for web analytics

Proceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product databases. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include product recommendations, product comparison, and demand forecasting. We formulate the extraction as a classification problem and use a semi-supervised algorithm (co-EM) along with (Naïve Bayes). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the supervised and semi-supervised classification algorithms. Finally, the extracted attributes and values are linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods.