Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases

  • Authors:
  • Manuel J. Fonseca;Joaquim A. Jorge

  • Affiliations:
  • -;-

  • Venue:
  • DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many indexing approaches for high-dimensional datapoints have evolved into very complex and hard to codealgorithms. Sometimes this complexity is not matched byincrease in performance. Motivated by these ideas, we takea step back and look at simpler approaches to indexing multimedia data. In this paper we propose a simple, (not simplisti) yet efficient indexing structure for high-dimensionaldata points of variable dimension, using dimension reduction. Our approach maps multidimensional points to a 1Dline by computing their Euclidean Norm and use a B+-Treeto store data points. We exploit B+-Tree efficient sequential search to develop simple, yet performant methodsto implement point, range and nearest-neighbor queries.To evaluate our technique we conducted a set of experiments, using both synthetic and real data. We analyze creation, insertion and query times as a function of data setsize and dimension. Results so far show that our simplescheme outperforms current approaches, such as the Pyramid Technique, the A-Tree and the SR-Tree, for many datadistributions. Moreover, our approach seems to scale betterboth with growing dimensionality and data set size, whileexhibiting low insertion and search times.