C4.5: programs for machine learning
C4.5: programs for machine learning
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Genre Classification and Domain Transfer for Information Filtering
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Sentiment-based search in digital libraries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Automatic classification of web search results: product review vs. non-review documents
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Text classification for data loss prevention
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Hi-index | 0.00 |
This study seeks to develop an automatic method to identify product reviews on the Web using the snippets (summary information) returned by search engines. Determining whether a snippet is a review or non-review is a challenging task, since the snippet usually does not contain many useful features for identifying review documents. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of snippets are useful for the classification. Then we employed a heuristic approach utilizing domain knowledge and found that the heuristic approach performs equally well as the machine learning approach. A hybrid approach which combines the machine learning technique and domain knowledge performs slightly better than the machine learning approach alone.