The curse of dimensionality in data mining and time series prediction

Authors:
Michel Verleysen;Damien François
Affiliations:
Machine Learning Group, Universit'e catholique de Louvain, Louvain-la-Neuve, Belgium;Machine Learning Group, Universit'e catholique de Louvain, Louvain-la-Neuve, Belgium
Venue:
IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Year:
2005

Citing 9
Cited 26

Dimension reduction by local principal component analysis

Neural Computation
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Self-Organizing Maps

Self-Organizing Maps
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Nonlinear Mapping for Data Structure Analysis

IEEE Transactions on Computers
Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets

IEEE Transactions on Neural Networks
Neural networks in financial engineering: a study in methodology

IEEE Transactions on Neural Networks

Methodology for long-term prediction of time series

Neurocomputing
Sequential input selection algorithm for long-term prediction of time series

Neurocomputing
Fuzzy classification using information theoretic learning vector quantization

Neurocomputing
An evaluation of dimension reduction techniques for one-class classification

Artificial Intelligence Review
Subspace sums for extracting non-random data from massive noise

Knowledge and Information Systems
RCGA-S/RCGA-SP Methods to Minimize the Delta Test for Regression Tasks

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
EDA-Based Logistic Regression Applied to Biomarkers Selection in Breast Cancer

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Advantages of using feature selection techniques on steganalysis schemes

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
OPELM and OPKNN in long-term prediction of time series using projected input data

Neurocomputing
Approximate k-NN delta test minimization method using genetic algorithms: Application to time series

Neurocomputing
Point-distribution algorithm for mining vector-item patterns

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
On the impact of the metrics choice in SOM learning: some empirical results from financial data

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Reducing the search space in evolutive design of ARIMA and ANN models for time series prediction

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Distributed learning with data reduction

Transactions on computational collective intelligence IV
Multistart strategy using delta test for variable selection

ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Quality of similarity rankings in time series

SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Long memory time series forecasting by using genetic programming

Genetic Programming and Evolvable Machines
On the Curse of Dimensionality in Supervised Learning of Smooth Regression Functions

Neural Processing Letters
High-Level fusion for intelligence applications using Recombinant Cognition Synthesis

Information Fusion
Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

Statistics and Computing
Pareto-optimal noise and approximation properties of RBF networks

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
An algorithm for sample and data dimensionality reduction using fast simulated annealing

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Clustering high dimensional data

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Rectifying the representation learned by Non-negative Matrix Factorization

International Journal of Knowledge-based and Intelligent Engineering Systems
Hybrid random subsample classifier ensemble for high dimensional data sets

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern data analysis tools have to work on high-dimensional data, whose components are not independently distributed. High-dimensional spaces show surprising, counter-intuitive geometrical properties that have a large influence on the performances of data analysis tools. Among these properties, the concentration of the norm phenomenon results in the fact that Euclidean norms and Gaussian kernels, both commonly used in models, become inappropriate in high-dimensional spaces. This papers presents alternative distance measures and kernels, together with geometrical methods to decrease the dimension of the space. The methodology is applied to a typical time series prediction example.