C4.5: programs for machine learning
C4.5: programs for machine learning
ACM Computing Surveys (CSUR)
Data mining: concepts and techniques
Data mining: concepts and techniques
A vector space model for automatic indexing
Communications of the ACM
Machine Learning
Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology
Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology
Maximizing Text-Mining Performance
IEEE Intelligent Systems
Machine Learning
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Graph-theoretic techniques for web content mining
Graph-theoretic techniques for web content mining
Fast categorization of web documents represented by graphs
WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Multi-lingual detection of terrorist content on the web
WISI'06 Proceedings of the 2006 international conference on Intelligence and Security Informatics
Hi-index | 0.00 |
In this paper we describe a new approach to classification of web documents. Most web classification methods are based on the vector space document representation of information retrieval. Recently the graph based web document representation model was shown to outperform the traditional vector representation using k-Nearest Neighbor (k-NN) classification algorithm. Here we suggest a new hybrid approach to web document classification built upon both, graph and vector representations. K-NN algorithm and three benchmark document collections were used to compare this method to graph and vector based methods separately. Results demonstrate that we succeed in most cases to outperform graph and vector approaches in terms of classification accuracy along with a significant reduction in classification time.