C4.5: programs for machine learning
C4.5: programs for machine learning
Bringing order to the Web: automatically categorizing search results
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Genre Classification and Domain Transfer for Information Filtering
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Sentiment-based search in digital libraries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Effects of web document evolution on genre classification
Proceedings of the 14th ACM international conference on Information and knowledge management
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Filtering product reviews from web search results
Proceedings of the 2007 ACM symposium on Document engineering
Effectiveness of web search results for genre and sentiment classification
Journal of Information Science
Hi-index | 0.00 |
This study seeks to develop an automatic method to identify product review documents on the Web using the snippets (summary information that includes the URL, title, and summary text) returned by the Web search engine. The aim is to allow the user to extend topical search with genre-based filtering or categorization. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of the snippets are useful for classification. The best results were obtained using just the title and URL (domain and folder names) of the snippets as phrase terms (n-grams). Then we developed a heuristic approach that utilizes domain knowledge constructed semi-automatically, and found that it performs comparatively well, with only a small drop in accuracy rates. A hybrid approach which combines both the machine learning and heuristic approaches performs slightly better than the machine learning approach alone.