Fast Manhattan sketches in data streams

Authors:
Jelani Nelson;David P. Woodruff
Affiliations:
MIT, Cambridge, MA, USA;IBM, San Jose, CA, USA
Venue:
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2010

Citing 37
Cited 5

Robust regression and outlier detection

Robust regression and outlier detection
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
An Approximate L1-Difference Algorithm for Massive Data Streams

SIAM Journal on Computing
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Comparing Data Streams Using Hamming Norms (How to Zero In)

IEEE Transactions on Knowledge and Data Engineering
Optimal space lower bounds for all frequency moments

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Tabulation based 4-universal hashing with applications to second moment estimation

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
What's hot and what's not: tracking most frequent items dynamically

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Space efficient mining of multigraph streams

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space complexity of hierarchical heavy hitters in multi-dimensional data streams

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Subgradient and sampling algorithms for l1 regression

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Sketching streams through the net: distributed approximate query tracking

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Space- and time-efficient deterministic algorithms for biased quantiles over data streams

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Stable distributions, pseudorandom generators, embeddings, and data stream computation

Journal of the ACM (JACM)
Very sparse random projections

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
One sketch for all: fast algorithms for compressed sensing

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Algorithms and estimators for accurate summarization of internet traffic

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Reversible sketches: enabling monitoring and analysis over high-speed data streams

IEEE/ACM Transactions on Networking (TON)
Estimators and tail bounds for dimension reduction in lα (0

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Declaring independence via the sketching of sketches

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Time-decaying aggregates in out-of-order streams

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Uniform Hashing in Constant Time and Optimal Space

SIAM Journal on Computing
Space-optimal heavy hitters with strong error bounds

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Data Stream Space Complexity of Cascaded Norms

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Coresets and sketches for high dimensional subspace approximation problems

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
1-pass relative-error Lp-sampling with applications

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
On the exact space complexity of sketching and streaming small norms

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Polylogarithmic private approximations and efficient matching

TCC'06 Proceedings of the Third conference on Theory of Cryptography
Mathematical programming algorithms for regression-based nonlinearfiltering in RN

IEEE Transactions on Signal Processing

Near-optimal private approximation protocols via a black box transformation

Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space

Proceedings of the forty-third annual ACM symposium on Theory of computing
Mergeable summaries

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Low rank approximation and regression in input sparsity time

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Tight lower bound for linear sketches of moments

ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The L1-distance, also known as the Manhattan or taxicab distance, between two vectors x, y in Rn is ∑_{i=1}overn |xi-y_i|. Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with O*(1/ε2) space and O*(1) update time. The O* notation hides polylogarithmic factors in ε, n, and the precision required to store vector entries. All previous algorithms either required Ω(1/ε3) space or Ω(1/ε2) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to O*(1) factors.