Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Opinion observer: analyzing and comparing opinions on the Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
Extracting product features and opinions from reviews
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
An unsupervised framework for extracting and normalizing product attributes from multiple web sites
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Towards `Interactive' Active Learning in Multi-view Feature Sets for Information Extraction
ECML '07 Proceedings of the 18th European conference on Machine Learning
Using structured text for large-scale attribute extraction
Proceedings of the 17th ACM conference on Information and knowledge management
Data & Knowledge Engineering
Low-Cost Supervision for Multiple-Source Attribute Extraction
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
An Unsupervised Approach to Product Attribute Extraction
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Latent variable models of concept-attribute attachment
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Generation of Specifications Forms through Statistical Learning for a Universal Services Marketplace
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Semantic annotation of biosystematics literature without training examples
Journal of the American Society for Information Science and Technology
A methodology to learn ontological attributes from the Web
Data & Knowledge Engineering
Acquisition of instance attributes via labeled and related instances
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Experiments in graph-based semi-supervised learning methods for class-instance acquisition
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SCAD: collective discovery of attribute values
Proceedings of the 20th international conference on World wide web
The role of query sessions in extracting instance attributes from web search queries
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Data extraction from web pages based on structural-semantic entropy
Proceedings of the 21st international conference companion on World Wide Web
CharaParser for fine-grained semantic annotation of organism morphological descriptions
Journal of the American Society for Information Science and Technology
RevMiner: an extractive interface for navigating reviews on a smartphone
Proceedings of the 25th annual ACM symposium on User interface software and technology
Automated faceted reporting for web analytics
Proceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoning
Hi-index | 0.00 |
We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product databases. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include product recommendations, product comparison, and demand forecasting. We formulate the extraction as a classification problem and use a semi-supervised algorithm (co-EM) along with (Naïve Bayes). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the supervised and semi-supervised classification algorithms. Finally, the extracted attributes and values are linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods.