Classifying Web Pages by Genre: An n-Gram Approach
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Hi-index | 0.00 |
In this study, we conducted an investigation on automatic genre classification for three common types of web pages addressing the effect of three theoretic feature selection measures, a range of feature set size, and three machine classifiers on the accuracy of the web page classification in the context of a set of controlled experiments. Our results are encouraging and we conclude that for binary classification tasks, at least for these web page genres, it is possible to reach satisfying results with small content-based feature sets generated with a sound feature selection measure and furthermore there is no evidence of interaction between these feature selection measures and the machine classifiers used.