Neural Networks for Web Content Filtering
IEEE Intelligent Systems
WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis
IEEE Transactions on Knowledge and Data Engineering
Temporal analysis of a very large topically categorized Web query log
Journal of the American Society for Information Science and Technology
The Role of URLs in Objectionable Web Content Categorization
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Behavioral classification on the click graph
Proceedings of the 17th international conference on World Wide Web
Information Processing and Management: an International Journal
Detecting pornographic video content by combining image features with motion information
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Web page classification on child suitability
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Collaborative cyberporn filtering with collective intelligence
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Collaborative blacklist generation via searches-and-clicks
Proceedings of the 20th ACM international conference on Information and knowledge management
Mining search intents for collaborative cyberporn filtering
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
This paper explores users' browsing intents to predict the category of a user's next access during web surfing, and applies the results to objectionable content filtering. A user's access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., hostname, bag-of-words, gTLD, IP, and port, to develop a linear chain CRF model for context-aware category prediction. Large-scale experiments show that our method achieves a promising accuracy of 0.9396 for objectionable access identification without requesting their corresponding page content. Error analysis indicates that our proposed model results in a low false positive rate of 0.0571. In real-life filtering simulations, our proposed model accomplishes macro-averaging blocking rate 0.9271, while maintaining a favorably low macro-averaging over-blocking rate 0.0575 for collaboratively filtering objectionable content with time change on the dynamic web.