A High-Dimensional Access Method for Approximated Similarity Search in Text Mining

  • Authors:
  • F. Artigas-Fuentes;R. Gil-Garcia;J. M. Badia-Contelles

  • Affiliations:
  • -;-;-

  • Venue:
  • ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a new access method for very high-dimensional data space is proposed. The method uses a graph structure and pivots for indexing objects, such as documents in text mining. It also applies a simple search algorithm that uses distance or similarity based functions in order to obtain the k-nearest neighbors for novel query objects. This method shows a good selectivity over very-high dimensional data spaces, and a better performance than other state-of-the-art methods. Although it is a probabilistic method, it shows a low error rate. The method is evaluated on data sets from the well-known collection Reuters corpus version 1 (RCV1-v2) and dealing with thousands of dimensions.