Nonlinear estimators and tail bounds for dimension reduction in l1 using Cauchy random projections

Authors:
Ping Li;Trevor J. Hastie;Kenneth W. Church
Affiliations:
Department of Statistics, Stanford University, Stanford, CA;Department of Statistics, Stanford University, Stanford, CA;Microsoft Research, One Microsoft Way, Redmond, WA
Venue:
COLT'07 Proceedings of the 20th annual conference on Learning theory
Year:
2007

Citing 10
Cited 4

Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Journal of Computer and System Sciences - Special issu on PODS 2001
On the impossibility of dimension reduction in l1

Journal of the ACM (JACM)
An algorithmic theory of learning: Robust concepts and random projection

Machine Learning
Stable distributions, pseudorandom generators, embeddings, and data stream computation

Journal of the ACM (JACM)
Using sketches to estimate associations

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations

Computational Linguistics
Improving random projections using marginal information

COLT'06 Proceedings of the 19th annual conference on Learning Theory
On the distribution of SINR for the MMSE MIMO receiver and performance analysis

IEEE Transactions on Information Theory
Support vector machines for histogram-based image classification

IEEE Transactions on Neural Networks

Very sparse stable random projections for dimension reduction in lα (0

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations

Computational Linguistics
Estimators and tail bounds for dimension reduction in lα (0

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

For dimension reduction in l1, one can multiply a data matrix A ∈ Rn×D by R ∈ RD×k (k ≪ D) whose entries are i.i.d. samples of Cauchy. The impossibility result says one can not recover the pairwise l1 distances in A from B = AR ∈ Rn×k, using linear estimators. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. We derive tail bounds for the geometric mean estimator and establish that k = O(log n/Ɛ2) suffices with the constants explicitly given. Asymptotically (as k → ∞), both the sample median estimator and the geometric mean estimator are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.