Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A two-phase algorithm for fast discovery of high utility itemsets
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Statistical approach for improving the quality of search results
ACACOS'11 Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science
Hi-index | 0.00 |
Abnormal remarks on World Wide Web, such as violence, threat, superstition, etc. may disturb the social order and public morality. Most traditional methods filter a page as long as it contains a keyword in a predefined blacklist. Such methods cannot provide a quantitative measure of how sensitive the content is. In this paper, we propose a utility-based Web content sensitivity mining approach. Utility is viewed as the measure of how sensitive a page is. It allows the Internet regulators to take different operations according to different sensitivity values. We apply our approach on a real-world Web dataset. It identified a number of sensitive Web pages that traditional frequency-based methods failed to find. By varying the sensitive values of the keywords, different sets of high sensitivity keywords were discovered.