Illustrating the curse of dimensionality numerically through different data distribution models

  • Authors:
  • Ian Eccles;Meng Su

  • Affiliations:
  • Penn State University, The Behrend College, Erie, PA;Penn State University, The Behrend College, Erie, PA

  • Venue:
  • ISICT '04 Proceedings of the 2004 international symposium on Information and communication technologies
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The purpose of this paper is to numerically illustrate a known problem, the curse of dimensionality in high dimensional database applications. The curse of dimensionality occurs when information is retrieved from indexed structures such as an R-tree and its variants that store feature vectors, which are extracted from multimedia data such as images. The "curse" says that as the dimension of the vectors increases, the cost of retrieving data increases dramatically. This problem has been discussed by other authors in some special cases. In this paper, we discuss distance distributions of vectors by assuming many well known distribution models. A Java applet. jCurse. has been built to show that the distance distributions of vectors always approach the boundary of a considered data region as the dimension increases. We discuss these vectors by using different distance metrics and distribution models. The program can easily be applied to more distribution models.