Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
Private vs. common random bits in communication complexity
Information Processing Letters
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Public vs. private coin flips in one round communication games (extended abstract)
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Communication complexity
Size-estimation framework with applications to transitive closure and reachability
Journal of Computer and System Sciences
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On randomized one-round communication complexity
Computational Complexity
Synopsis data structures for massive data sets
External memory algorithms
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Identifying Representative Trends in Massive Time Series Data Sets Using Sketches
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Distributed streams algorithms for sliding windows
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Estimating Rarity and Similarity over Data Stream Windows
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Comparing Data Streams Using Hamming Norms (How to Zero In)
IEEE Transactions on Knowledge and Data Engineering
Issues in data stream management
ACM SIGMOD Record
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximate Aggregation Techniques for Sensor Databases
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
An improved data stream algorithm for frequency moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Synopsis diffusion for robust aggregation in sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Tracking set-expression cardinalities over continuous update streams
The VLDB Journal — The International Journal on Very Large Data Bases
Range-Efficient Computation of F" over Massive Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Sampling search-engine results
WWW '05 Proceedings of the 14th international conference on World Wide Web
Join-distinct aggregate estimation over update streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space efficient mining of multigraph streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Maintaining significant stream statistics over sliding windows
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A geometric approach to monitoring threshold functions over distributed data streams
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sketching asynchronous streams over a sliding window
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Maintaining stream statistics over multiscale sliding windows
ACM Transactions on Database Systems (TODS)
Spatially-decaying aggregation over a network
Journal of Computer and System Sciences
Counting distinct items over update streams
Theoretical Computer Science
On synopses for distinct-value estimation under multiset operations
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Time-decaying sketches for sensor data aggregation
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Comparing data streams using Hamming norms (how to zero in)
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A geometric approach to monitoring threshold functions over distributed data streams
ACM Transactions on Database Systems (TODS)
Synopsis diffusion for robust aggregation in sensor networks
ACM Transactions on Sensor Networks (TOSN)
Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Processing top k queries from samples
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Processing top-k queries from samples
Computer Networks: The International Journal of Computer and Telecommunications Networking
Estimating Hybrid Frequency Moments of Data Streams
FAW '08 Proceedings of the 2nd annual international workshop on Frontiers in Algorithmics
Online maintenance of very large random samples on flash storage
Proceedings of the VLDB Endowment
On Estimating Frequency Moments of Data Streams
APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
Multi-query optimization for sketch-based estimation
Information Systems
Two improved range-efficient algorithms for F0 estimation
Theoretical Computer Science
Robust approximate aggregation in sensor data management systems
ACM Transactions on Database Systems (TODS)
Leveraging discarded samples for tighter estimation of multiple-set aggregates
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
A Note on Estimating Hybrid Frequency Moment of Data Streams
AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Coordinated weighted sampling for estimating aggregates over multiple weight assignments
Proceedings of the VLDB Endowment
Online maintenance of very large random samples on flash storage
The VLDB Journal — The International Journal on Very Large Data Bases
Recursive n-gram hashing is pairwise independent, at best
Computer Speech and Language
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Exponential time improvement for min-wise based algorithms
Information and Computation
Time-decaying Sketches for Robust Aggregation of Sensor Data
SIAM Journal on Computing
Get the most out of your sample: optimal unbiased estimators using partial information
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-efficient tracking of persistent items in a massive data stream
Proceedings of the 5th ACM international conference on Distributed event-based system
Continuous distributed monitoring: a short survey
Proceedings of the First International Workshop on Algorithms and Models for Distributed Event Processing
Optimal random sampling from distributed streams revisited
DISC'11 Proceedings of the 25th international conference on Distributed computing
Distinct estimate of set expressions over sliding windows
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Counting distinct items over update streams
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Exponential time improvement for min-wise based algorithms
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Estimating hybrid frequency moments of data streams
Journal of Combinatorial Optimization
An efficient algorithm for frequent itemset mining on data streams
ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
On approximation algorithms for data mining applications
Efficient Approximation and Online Algorithms
On estimating path aggregates over streaming graphs
ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
Continuous monitoring in the dynamic sensor field model
ALGOSENSORS'11 Proceedings of the 7th international conference on Algorithms for Sensor Systems, Wireless Ad Hoc Networks and Autonomous Mobile Entities
On power-law distributed balls in bins and its applications to view size estimation
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Randomized algorithms for tracking distributed count, frequencies, and ranks
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Don't let the negatives bring you down: sampling from streams of signed updates
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Survey: Computational models for networks of tiny artifacts: A survey
Computer Science Review
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Tracking distributed aggregates over time-based sliding windows
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Continuous monitoring in the dynamic sensor field model
Theoretical Computer Science
The continuous distributed monitoring model
ACM SIGMOD Record
Hi-index | 0.00 |
Massive data sets often arise as physically distributed, parallel data streams. We present algorithms for estimating simple functions on the union of such data streams, while using only logarithmic space per stream. Each processor observes only its own stream, and communicates with the other processors only after observing its entire stream. This models the set-up in current network monitoring products. Our algorithms employ a novel coordinated sampling technique to extract a sample of the union; this sample can be used to estimate aggregate functions on the union. The technique can also be used to estimate aggregate functions over the distinct “labels” in one or more data streams, e.g., to determine the zeroth frequency moment (i.e., the number of distinct labels) in one or more data streams. Our space and time bounds are the best known for these problems, and our logarithmic space bounds for coordinated sampling contrast with polynomial lower bounds for independent sampling. We relate our distributed streams model to previously studied non-distributed (i.e., merged) streams models, presenting tight bounds on the gap between the distributed and merged models for deterministic algorithms.