Fast k-NN classifier for documents based on a graph structure

  • Authors:
  • Fernando José Artigas-Fuentes;Reynaldo Gil-García;José Manuel Badía-Contelles;Aurora Pons-Porrata

  • Affiliations:
  • Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba;Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba;Computer Science and Engineering Department, Universitat Jaume I, Castelló, Spain;Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba

  • Venue:
  • CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a fast k nearest neighbors (k-NN) classifier for documents is presented. Documents are usually represented in a high-dimensional feature space, where their terms are treated as features and the weight of each term reflects its importance in the document. There are many approaches to find the vicinity of an object, but their performance drastically decreases as the number of dimensions grows. This problem prevents its application for documents. The proposed method is based on a graph index structure with a fast search algorithm. Its high selectivity permits to obtain a similar classification quality than the exhaustive classifier, with a few number of computed distances. Our experimental results show that our method can be applied to problems of very high dimensionality, such as Text Mining.