Re-designing distance functions and distance-based applications for high dimensional data

  • Authors:
  • Charu C. Aggarwal

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • ACM SIGMOD Record
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, the detrimental effects of the curse of high dimensionality have been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from the performance perspective. Recent research results show that in high dimensional space, the concept of proximity may not even be qualitatively meaningful [6]. In this paper, we try to outline the effects of generalizing low dimensional techniques to high dimensional applications and the natural effects of sparsity on distance based applications. We outline the guidelines required in order to re-design either the distance functions or the distance-based applications in a meaningful way for high dimensional domains. We provide novel perspectives and insights on some new lines of work for broadening application definitions in order to effectively deal with the dimensionality curse.