Nonlinear estimators and tail bounds for dimension reduction in l1 using Cauchy random projections

  • Authors:
  • Ping Li;Trevor J. Hastie;Kenneth W. Church

  • Affiliations:
  • Department of Statistics, Stanford University, Stanford, CA;Department of Statistics, Stanford University, Stanford, CA;Microsoft Research, One Microsoft Way, Redmond, WA

  • Venue:
  • COLT'07 Proceedings of the 20th annual conference on Learning theory
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

For dimension reduction in l1, one can multiply a data matrix A ∈ Rn×D by R ∈ RD×k (k ≪ D) whose entries are i.i.d. samples of Cauchy. The impossibility result says one can not recover the pairwise l1 distances in A from B = AR ∈ Rn×k, using linear estimators. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. We derive tail bounds for the geometric mean estimator and establish that k = O(log n/Ɛ2) suffices with the constants explicitly given. Asymptotically (as k → ∞), both the sample median estimator and the geometric mean estimator are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.