Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
Journal of Computer and System Sciences - Special issu on PODS 2001
On the impossibility of dimension reduction in l1
Journal of the ACM (JACM)
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Journal of the ACM (JACM)
Using sketches to estimate associations
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Computational Linguistics
Improving random projections using marginal information
COLT'06 Proceedings of the 19th annual conference on Learning Theory
On the distribution of SINR for the MMSE MIMO receiver and performance analysis
IEEE Transactions on Information Theory
Support vector machines for histogram-based image classification
IEEE Transactions on Neural Networks
Very sparse stable random projections for dimension reduction in lα (0
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Computational Linguistics
Estimators and tail bounds for dimension reduction in lα (0
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Journal of the ACM (JACM)
Hi-index | 0.00 |
For dimension reduction in l1, one can multiply a data matrix A ∈ Rn×D by R ∈ RD×k (k ≪ D) whose entries are i.i.d. samples of Cauchy. The impossibility result says one can not recover the pairwise l1 distances in A from B = AR ∈ Rn×k, using linear estimators. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. We derive tail bounds for the geometric mean estimator and establish that k = O(log n/Ɛ2) suffices with the constants explicitly given. Asymptotically (as k → ∞), both the sample median estimator and the geometric mean estimator are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.