Automatic classification of web search results: product review vs. non-review documents

Authors:
Tun Thura Thet;Jin-Cheon Na;Christopher S. G. Khoo
Affiliations:
Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore;Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore;Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore
Venue:
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Year:
2007

Citing 12
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Bringing order to the Web: automatically categorizing search results

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Genre Classification and Domain Transfer for Information Filtering

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Automatic detection of text genre

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Sentiment-based search in digital libraries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Effects of web document evolution on genre classification

Proceedings of the 14th ACM international conference on Information and knowledge management
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Filtering product reviews from web search results

Proceedings of the 2007 ACM symposium on Document engineering

Effectiveness of web search results for genre and sentiment classification

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study seeks to develop an automatic method to identify product review documents on the Web using the snippets (summary information that includes the URL, title, and summary text) returned by the Web search engine. The aim is to allow the user to extend topical search with genre-based filtering or categorization. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of the snippets are useful for classification. The best results were obtained using just the title and URL (domain and folder names) of the snippets as phrase terms (n-grams). Then we developed a heuristic approach that utilizes domain knowledge constructed semi-automatically, and found that it performs comparatively well, with only a small drop in accuracy rates. A hybrid approach which combines both the machine learning and heuristic approaches performs slightly better than the machine learning approach alone.