An improved data stream algorithm for frequency moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient algorithms for constructing (1+,ε, β)-spanners in the distributed and streaming models
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
An information statistics approach to data stream and communication complexity
Journal of Computer and System Sciences - Special issue on FOCS 2002
Framework and algorithms for trend analysis in massive temporal data sets
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Range-Efficient Computation of F" over Massive Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Optimal approximations of the frequency moments of data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Graph distances in the streaming model: the value of space
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Trading off space for passes in graph streaming problems
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Streaming and sublinear approximation of entropy and information distances
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On graph problems in a semi-streaming model
Theoretical Computer Science - Automata, languages and programming: Algorithms and complexity (ICALP-A 2004)
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Fast range-summable random variables for efficient aggregate estimation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Secure multiparty computation of approximations
ACM Transactions on Algorithms (TALG)
Pseudo-random number generation for sketch-based estimations
ACM Transactions on Database Systems (TODS)
Robust lower bounds for communication and stream computation
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Sketching in adversarial environments
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Sketching information divergences
Machine Learning
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Sublinear estimation of entropy and information distances
ACM Transactions on Algorithms (TALG)
Sketching Algorithms for Approximating Rank Correlations in Collaborative Filtering Systems
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Trading off space for passes in graph streaming problems
ACM Transactions on Algorithms (TALG)
Sketching techniques for collaborative filtering
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Sketching information divergences
COLT'07 Proceedings of the 20th annual conference on Learning theory
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fully decentralized computation of aggregates over data streams
Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
Adapting parallel algorithms to the W-Stream model, with applications to graph problems
Theoretical Computer Science
On the exact space complexity of sketching and streaming small norms
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners
ACM Transactions on Algorithms (TALG)
Fingerprinting ratings for collaborative filtering: theoretical and empirical analysis
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Fully decentralized computation of aggregates over data streams
ACM SIGKDD Explorations Newsletter
Near-optimal private approximation protocols via a black box transformation
Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
What's the difference?: efficient set reconciliation without prior context
Proceedings of the ACM SIGCOMM 2011 conference
Finding graph matchings in data streams
APPROX'05/RANDOM'05 Proceedings of the 8th international workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th international conference on Randamization and Computation: algorithms and techniques
Efficient error estimating coding: feasibility and applications
IEEE/ACM Transactions on Networking (TON)
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Statistical estimation with bounded memory
Statistics and Computing
Sketching in Adversarial Environments
SIAM Journal on Computing
Adapting parallel algorithms to the W-stream model, with applications to graph problems
MFCS'07 Proceedings of the 32nd international conference on Mathematical Foundations of Computer Science
Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Checking and spot-checking the correctness of priority queues
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
ACM Transactions on Database Systems (TODS) - Invited papers issue
Hi-index | 0.00 |
Massive data sets are increasingly important in a wide range of applications, including observational sciences, product marketing, and the monitoring and operations of large systems. In network operations, raw data typically arrive in streams, and decisions must be made by algorithms that make one pass over each stream, throw much of the raw data away, and produce "synopses" or "sketches" for further processing. Moreover, network-generated massive data sets are often distributed: Several different, physically separated network elements may receive or generate data streams that, together, comprise one logical data set; to be of use in operations, the streams must be analyzed locally and their synopses sent to a central operations facility. The enormous scale, distributed nature, and one-pass processing requirement on the data sets of interest must be addressed with new algorithmic techniques.We present one fundamental new technique here: a space-efficient, one-pass algorithm for approximating the L1-difference $\sum_i|a_i-b_i|$ between two functions, when the function values ai and bi are given as data streams, and their order is chosen by an adversary. Our main technical innovation, which may be of interest outside the realm of massive data stream algorithmics, is a method of constructing families $\{V_j(s)\}$ of limited-independence random variables that are range-summable, by which we mean that $\sum_{j=0}^{c-1} V_j(s)$ is computable in time polylog(c) for all seeds s. Our L1-difference algorithm can be viewed as a "sketching" algorithm, in the sense of [Broder et al., J. Comput. System Sci., 60 (2000), pp. 630--659], and our technique performs better than that of Broder et al. when used to approximate the symmetric difference of two sets with small symmetric difference.