Minimum neighbor distance estimators of intrinsic dimension

Authors:
Gabriele Lombardi;Alessandro Rozza;Claudio Ceruti;Elena Casiraghi;Paola Campadelli
Affiliations:
Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Milano, Italy;Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Milano, Italy;Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Milano, Italy;Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Milano, Italy;Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Milano, Italy
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Year:
2011

Citing 7
Cited 0

An Evaluation of Intrinsic Dimensionality Estimators

IEEE Transactions on Pattern Analysis and Machine Intelligence
Estimating the Intrinsic Dimension of Data with a Fractal-Based Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Intrinsic dimensionality estimation of submanifolds in Rd

ICML '05 Proceedings of the 22nd international conference on Machine learning
Manifold-adaptive dimension estimation

Proceedings of the 24th international conference on Machine learning
An Algorithm for Finding Intrinsic Dimensionality of Data

IEEE Transactions on Computers
Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting

The Journal of Machine Learning Research
Geodesic entropic graphs for dimension and entropy estimation in manifold learning

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the machine learning techniques suffer the "curse of dimensionality" effect when applied to high dimensional data. To face this limitation, a common preprocessing step consists in employing a dimensionality reduction technique. In literature, a great deal of research work has been devoted to the development of algorithms performing this task. Often, these techniques require as parameter the number of dimensions to be retained; to this aim, they need to estimate the "intrinsic dimensionality" of the given dataset, which refers to the minimum number of degrees of freedom needed to capture all the information carried by the data. Although many estimation techniques have been proposed, most of them fail in case of noisy data or when the intrinsic dimensionality is too high. In this paper we present a family of estimators based on the probability density function of the normalized nearest neighbor distance. We evaluate the proposed techniques on both synthetic and real datasets comparing their performances with those obtained by state of the art algorithms; the achieved results prove that the proposed methods are promising.