Web Document Classification Based on Fuzzy k-NN Algorithm

  • Authors:
  • Juan Zhang;Yi Niu;Huabei Nie

  • Affiliations:
  • -;-;-

  • Venue:
  • CIS '09 Proceedings of the 2009 International Conference on Computational Intelligence and Security - Volume 01
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web document classification is an important technique of web mining. Web pages classification has been studied extensively since the Internet has become a huge database of information. The k-NN is a simple classification algorithm that is used to assign patterns of unknown classification to the class of the majority of its k nearest neighbors of known classification according to the distance measure, but a main drawback of the method is that each of the patterns of known classification is considered equally important in the assignment of the pattern to be classified. Fuzzy k-nearest neighbor (fuzzy k-NN) is improving algorithm of k-NN, which is applied successfully in structural data classification. This paper presents the web document classification based on fuzzy k-NN network, in the process of classification, TF/IDF (term frequency / inverse document frequency) is adopted for selecting features of document, to increase the accuracy and suit for real world, membership grade is used. Experimental results show that classification performance is better than both k-NN and support vector machine (SVM).