WebAngels Filter: A Violent Web Filtering Engine Using Textual and Structural Content-Based Analysis
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification
ACM Transactions on the Web (TWEB)
Collaborative cyberporn filtering with collective intelligence
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Collaborative blacklist generation via searches-and-clicks
Proceedings of the 20th ACM international conference on Information and knowledge management
Web objectionable text content detection using topic modeling technique
Expert Systems with Applications: An International Journal
Objectionable content filtering by click-through data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
CALA: An unsupervised URL-based web page classification system
Knowledge-Based Systems
Hi-index | 0.00 |
By analyzing a set of access attempts by teenagers to pornographic websites, we found that more than half of them are image searches and visits to websites with little text information. It is obvious that textual content-based filters cannot correctly categorize such access attempts. This paper describes a novel URL-based objectionable content categorization approach and its application to web filtering. In this approach, we break the URL into a sequence of n-grams with a range of n's and then a machine learning algorithm is applied to the n-gram representation of URLs to learn a classifier of pornographic websites. We showed empirically that the URL-based approach is able to correctly identify many of the objectionable web pages. We also demonstrated that the optimum web filtering results could be achieved when it was used with a content-based approach in a production environment.