2-Way text classification for harmful web documents

Authors:
Youngsoo Kim;Taekyong Nam;Dongho Won
Affiliations:
Network Security Group, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Korea;Network Security Group, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Korea;Information Security Group, School of Information and Communication Engineering, Sungkyunkwan University, Suwon, Gyeonggi-do, Korea
Venue:
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II
Year:
2006

Citing 2
Cited 0

A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The openness of the Web allows any user to access almost any type of information. However, some information, such as adult content, is not appropriate for all users, notably children. Additionally for adults, some contents included in abnormal porn sites can do ordinary people’s mental health harm. In this paper, we propose an efficient 2-way text filter for blocking harmful web documents and also present a new criterion for clear classification. It filters off 0-grade web texts containing no harmful words using pattern matching with harmful words dictionaries, and classifies 1-grade,2-grade and 3-grade web texts using a machine learning algorithm.