Fast k-NN classifier for documents based on a graph structure

Authors:
Fernando José Artigas-Fuentes;Reynaldo Gil-García;José Manuel Badía-Contelles;Aurora Pons-Porrata
Affiliations:
Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba;Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba;Computer Science and Engineering Department, Universitat Jaume I, Castelló, Spain;Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba
Venue:
CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Year:
2010

Citing 12
Cited 0

The multi-class metric problem in nearest neighbour discrimination rules

Pattern Recognition
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
VQ-index: an index structure for similarity searching in multimedia databases

Proceedings of the tenth ACM international conference on Multimedia
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
General C-Means Clustering Model

IEEE Transactions on Pattern Analysis and Machine Intelligence
High dimensional nearest neighbor searching

Information Systems
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Speeding Up Permutation Based Indexing with Indexing

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Fast k most similar neighbor classifier for mixed data based on a tree structure

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
A High-Dimensional Access Method for Approximated Similarity Search in Text Mining

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a fast k nearest neighbors (k-NN) classifier for documents is presented. Documents are usually represented in a high-dimensional feature space, where their terms are treated as features and the weight of each term reflects its importance in the document. There are many approaches to find the vicinity of an object, but their performance drastically decreases as the number of dimensions grows. This problem prevents its application for documents. The proposed method is based on a graph index structure with a fast search algorithm. Its high selectivity permits to obtain a similar classification quality than the exhaustive classifier, with a few number of computed distances. Our experimental results show that our method can be applied to problems of very high dimensionality, such as Text Mining.