The multi-class metric problem in nearest neighbour discrimination rules
Pattern Recognition
An optimal algorithm for approximate nearest neighbor searching
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
VQ-index: an index structure for similarity searching in multimedia databases
Proceedings of the tenth ACM international conference on Multimedia
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
General C-Means Clustering Model
IEEE Transactions on Pattern Analysis and Machine Intelligence
High dimensional nearest neighbor searching
Information Systems
Effective Proximity Retrieval by Ordering Permutations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Speeding Up Permutation Based Indexing with Indexing
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Fast k most similar neighbor classifier for mixed data based on a tree structure
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
A High-Dimensional Access Method for Approximated Similarity Search in Text Mining
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Hi-index | 0.00 |
In this paper, a fast k nearest neighbors (k-NN) classifier for documents is presented. Documents are usually represented in a high-dimensional feature space, where their terms are treated as features and the weight of each term reflects its importance in the document. There are many approaches to find the vicinity of an object, but their performance drastically decreases as the number of dimensions grows. This problem prevents its application for documents. The proposed method is based on a graph index structure with a fast search algorithm. Its high selectivity permits to obtain a similar classification quality than the exhaustive classifier, with a few number of computed distances. Our experimental results show that our method can be applied to problems of very high dimensionality, such as Text Mining.