WebAngels Filter: A Violent Web Filtering Engine Using Textual and Structural Content-Based Analysis
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
WIA: a web inspection architecture
International Journal of Knowledge and Web Intelligence
Hi-index | 0.00 |
As the web expands exponentially, there are a flood of pornographic web sites on the internet. Thus effective web filtering systems are essential. Web filtering based on hypertext classification has become one of the important techniques to handle and filter inappropriate information on the web. Hypertext classification, that is the automatic classification of web documents into predefined classes, came to elevate humans from that task. However, how to improve the performance of the hypertext classification under the situation of noisy data is still a challenging problem. In this paper, we propose a new approach for hypertext classification in web filtering, which uses a novel Support Vector Machine and K-nearest neighbor (KNN-SVM) to remove noisy training examples. The experimental results show that the generalization performance and the accuracy of classification are improved significantly compared to that of the traditional SVM classifier, and adapt to engineering applications.