Objectionable content filtering by click-through data

Authors:
Lung-Hao Lee;Yen-Cheng Juan;Hsin-Hsi Chen;Yuen-Hsien Tseng
Affiliations:
National Taiwan University, Taipei, Taiwan Roc;National Taiwan University, Taipei, Taiwan Roc;National Taiwan University, Taipei, Taiwan Roc;National Taiwan Normal University, Taipei, Taiwan Roc
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 11
Cited 0

Neural Networks for Web Content Filtering

IEEE Intelligent Systems
WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis

IEEE Transactions on Knowledge and Data Engineering
Temporal analysis of a very large topically categorized Web query log

Journal of the American Society for Information Science and Technology
The Role of URLs in Objectionable Web Content Categorization

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Behavioral classification on the click graph

Proceedings of the 17th international conference on World Wide Web
Generation of pornographic blacklist and its incremental update using an inverse chi-square based method

Information Processing and Management: an International Journal
Detecting pornographic video content by combining image features with motion information

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Web page classification on child suitability

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Collaborative cyberporn filtering with collective intelligence

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Collaborative blacklist generation via searches-and-clicks

Proceedings of the 20th ACM international conference on Information and knowledge management
Mining search intents for collaborative cyberporn filtering

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores users' browsing intents to predict the category of a user's next access during web surfing, and applies the results to objectionable content filtering. A user's access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., hostname, bag-of-words, gTLD, IP, and port, to develop a linear chain CRF model for context-aware category prediction. Large-scale experiments show that our method achieves a promising accuracy of 0.9396 for objectionable access identification without requesting their corresponding page content. Error analysis indicates that our proposed model results in a low false positive rate of 0.0571. In real-life filtering simulations, our proposed model accomplishes macro-averaging blocking rate 0.9271, while maintaining a favorably low macro-averaging over-blocking rate 0.0575 for collaboratively filtering objectionable content with time change on the dynamic web.