Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections

Authors:
Ping Li;Trevor J. Hastie;Kenneth W. Church
Affiliations:
-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 0
Cited 8

Very sparse stable random projections for dimension reduction in lα (0

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimators and tail bounds for dimension reduction in lα (0

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Dimension amnesic pyramid match kernel

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Random Projection RBF Nets for Multidimensional Density Estimation

International Journal of Applied Mathematics and Computer Science - Issues in Fault Diagnosis and Fault Tolerant Control
Improving compressed counting

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency

The Journal of Machine Learning Research
Distributed high dimensional information theoretical image registration via random projections

Digital Signal Processing
Efficient point-to-subspace query in ℓ1 with application to robust face recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

For dimension reduction in the l1 norm, the method of Cauchy random projections multiplies the original data matrix A ∈ ℝn×D with a random matrix R ∈ ℝD×k (k≪D) whose entries are i.i.d. samples of the standard Cauchy C(0,1). Because of the impossibility result, one can not hope to recover the pairwise l1 distances in A from B=A×R∈ ℝn×k, using linear estimators without incurring large errors. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining. We study three types of nonlinear estimators: the sample median estimators, the geometric mean estimators, and the maximum likelihood estimators (MLE). We derive tail bounds for the geometric mean estimators and establish that k = O(log n / ε2) suffices with the constants explicitly given. Asymptotically (as k→∞), both the sample median and the geometric mean estimators are about 80% efficient compared to the MLE. We analyze the moments of the MLE and propose approximating its distribution of by an inverse Gaussian.