The curse of dimensionality in data mining and time series prediction

  • Authors:
  • Michel Verleysen;Damien François

  • Affiliations:
  • Machine Learning Group, Universit'e catholique de Louvain, Louvain-la-Neuve, Belgium;Machine Learning Group, Universit'e catholique de Louvain, Louvain-la-Neuve, Belgium

  • Venue:
  • IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern data analysis tools have to work on high-dimensional data, whose components are not independently distributed. High-dimensional spaces show surprising, counter-intuitive geometrical properties that have a large influence on the performances of data analysis tools. Among these properties, the concentration of the norm phenomenon results in the fact that Euclidean norms and Gaussian kernels, both commonly used in models, become inappropriate in high-dimensional spaces. This papers presents alternative distance measures and kernels, together with geometrical methods to decrease the dimension of the space. The methodology is applied to a typical time series prediction example.