Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Randomized algorithms
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Synopsis data structures for massive data sets
External memory algorithms
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying and mining data streams: you only get one look a tutorial
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Comparing Data Streams Using Hamming Norms (How to Zero In)
IEEE Transactions on Knowledge and Data Engineering
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Optimal space lower bounds for all frequency moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximation techniques for spatial data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Effective Computation of Biased Quantiles over Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
What's hot and what's not: tracking most frequent items dynamically
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Space efficient mining of multigraph streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sketching streams through the net: distributed approximate query tracking
VLDB '05 Proceedings of the 31st international conference on Very large data bases
What's new: finding significant differences in network data streams
IEEE/ACM Transactions on Networking (TON)
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams
IEEE Transactions on Knowledge and Data Engineering
To randomize or not to randomize: space optimal summaries for hyperlink analysis
Proceedings of the 15th international conference on World Wide Web
Design of a novel statistics counter architecture with optimal space and time efficiency
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Finding global icebergs over distributed data sets
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Graph-based synopses for relational selectivity estimation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Communication-efficient distributed monitoring of thresholded counts
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Estimating the selectivity of approximate string queries
ACM Transactions on Database Systems (TODS)
Statistical analysis of sketch estimators
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Sketching probabilistic data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Streaming in a connected world: querying and tracking distributed data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Improving sketch reconstruction accuracy using linear least squares method
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
CountTorrent: ubiquitous access to query aggregates in dynamic and mobile sensor networks
Proceedings of the 5th international conference on Embedded networked sensor systems
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Synopsis diffusion for robust aggregation in sensor networks
ACM Transactions on Sensor Networks (TOSN)
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
Tight lower bounds for selection in randomly ordered streams
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Approximate mining of frequent patterns on streams
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Counter braids: a novel counter architecture for per-flow measurement
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Time-decaying aggregates in out-of-order streams
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sketching information divergences
Machine Learning
Sketches for size of join estimation
ACM Transactions on Database Systems (TODS)
Entity categorization over large document collections
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding Frequent Items over General Update Streams
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Efficiently Discovering Recent Frequent Items in Data Streams
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Finding Frequent Items in a Turnstile Data Stream
COCOON '08 Proceedings of the 14th annual international conference on Computing and Combinatorics
Simplified Planar Coresets for Data Streams
SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
Adaptive shared-state sampling
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Finding frequent items in data streams
Proceedings of the VLDB Endowment
The eternal sunshine of the sketch data structure
Computer Networks: The International Journal of Computer and Telecommunications Networking
Information Processing Letters
On Estimating Frequency Moments of Data Streams
APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
BRICK: a novel exact active statistics counter architecture
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Overcoming the l1 non-embeddability barrier: algorithms for product metrics
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
TuG synopses for approximate query answering
ACM Transactions on Database Systems (TODS)
Robust approximate aggregation in sensor data management systems
ACM Transactions on Database Systems (TODS)
Frequent items in streaming data: An experimental evaluation of the state-of-the-art
Data & Knowledge Engineering
Numerical linear algebra in the streaming model
Proceedings of the forty-first annual ACM symposium on Theory of computing
Finding the frequent items in streams of data
Communications of the ACM - A View of Parallel Computing
Small synopses for group-by query verification on outsourced data streams
ACM Transactions on Database Systems (TODS)
A Note on Estimating Hybrid Frequency Moment of Data Streams
AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Deterministically Estimating Data Stream Frequencies
COCOA '09 Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
The Frequent Items Problem, under Polynomial Decay, in the Streaming Model
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Scalable proximity estimation and link prediction in online social networks
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Modeling RPS and evaluating video repair with VQM
IEEE Transactions on Multimedia
Succinct approximate counting of skewed data
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Precomputing search features for fast and accurate query classification
Proceedings of the third ACM international conference on Web search and data mining
Flooding attacks detection and victim identification over high speed networks
GIIS'09 Proceedings of the Second international conference on Global Information Infrastructure Symposium
Methods for finding frequent items in data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Finding frequent items over sliding windows with constant update time
Information Processing Letters
A Streaming Parallel Decision Tree Algorithm
The Journal of Machine Learning Research
Sketching information divergences
COLT'07 Proceedings of the 20th annual conference on Learning theory
Finding heavy hitters over the sliding window of a weighted data stream
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
A near-optimal algorithm for estimating the entropy of a stream
ACM Transactions on Algorithms (TALG)
Measuring independence of datasets
Proceedings of the forty-second ACM symposium on Theory of computing
Proceedings of the forty-second ACM symposium on Theory of computing
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Logging every footstep: quantile summaries for the entire history
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Lower bounds on frequency estimation of data streams
CSR'08 Proceedings of the 3rd international conference on Computer science: theory and applications
On the use of sketches and wavelet analysis for network anomaly detection
Proceedings of the 6th International Wireless Communications and Mobile Computing Conference
The frequent items problem, under polynomial decay, in the streaming model
Theoretical Computer Science
Fully decentralized computation of aggregates over data streams
Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
High-speed per-flow traffic measurement with probabilistic multiplicity counting
INFOCOM'10 Proceedings of the 29th conference on Information communications
Approximating sliding windows by cyclic tree-like histograms for efficient range queries
Data & Knowledge Engineering
Mining discriminative items in multiple data streams
World Wide Web
Space-optimal heavy hitters with strong error bounds
ACM Transactions on Database Systems (TODS)
International Journal of Network Management
Sketching techniques for large scale NLP
WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Sketch techniques for scaling distributional similarity to the web
GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
1-pass relative-error Lp-sampling with applications
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Lower bounds for sparse recovery
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Revisiting the case for a minimalist approach for network flow monitoring
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Popularity is everything: a new approach to protecting passwords from statistical-guessing attacks
HotSec'10 Proceedings of the 5th USENIX conference on Hot topics in security
Parallelizing weighted frequency counting in high-speed network monitoring
Computer Communications
Uncovering Global Icebergs in Distributed Streams: Results and Implications
Journal of Network and Systems Management
Fully decentralized computation of aggregates over data streams
ACM SIGKDD Explorations Newsletter
Regression on evolving multi-relational data streams
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
On-line learning: where are we so far?
Ubiquitous knowledge discovery
On-line learning: where are we so far?
Ubiquitous knowledge discovery
Tight bounds for Lp samplers, finding duplicates in streams, and related problems
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Pan-private algorithms via statistics on sketches
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Beyond simple aggregates: indexing for summary queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling based algorithms for quantile computation in sensor networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Near-optimal private approximation protocols via a black box transformation
Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Voting almost maximizes social welfare despite limited communication
Artificial Intelligence
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis
Data Mining and Knowledge Discovery
Space-efficient tracking of persistent items in a massive data stream
Proceedings of the 5th ACM international conference on Distributed event-based system
Range majority in constant time and linear space
ICALP'11 Proceedings of the 38th international colloquim conference on Automata, languages and programming - Volume Part I
Periodicity and cyclic shifts via linear sketches
APPROX'11/RANDOM'11 Proceedings of the 14th international workshop and 15th international conference on Approximation, randomization, and combinatorial optimization: algorithms and techniques
Streaming algorithms with one-sided estimation
APPROX'11/RANDOM'11 Proceedings of the 14th international workshop and 15th international conference on Approximation, randomization, and combinatorial optimization: algorithms and techniques
Mining approximate frequent closed flows over packet streams
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
BRICK: a novel exact active statistics counter architecture
IEEE/ACM Transactions on Networking (TON)
On-demand time-decaying bloom filters for telemarketer detection
ACM SIGCOMM Computer Communication Review
Verifying computations with streaming interactive proofs
Proceedings of the VLDB Endowment
Finding frequent elements in compressed 2D arrays and strings
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Streaming Solutions for Fine-Grained Network Traffic Measurements and Analysis
Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
Sketching the delay: tracking temporally uncorrelated flow-level latencies
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
gSketch: on query estimation in graph streams
Proceedings of the VLDB Endowment
Combining wavelet analysis and information theory for network anomaly detection
Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies
Estimating entropy and entropy norm on data streams
STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
Generating semantic orientation lexicon using large data and thesaurus
WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Maintaining moving sums over data streams
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Approximate scalable bounded space sketch for large data NLP
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A comparison between divergence measures for network anomaly detection
Proceedings of the 7th International Conference on Network and Services Management
Estimating hybrid frequency moments of data streams
Journal of Combinatorial Optimization
Secure Distributed Data Aggregation
Foundations and Trends in Databases
Graph sketches: sparsification, spanners, and subgraphs
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Space-efficient estimation of statistics over sub-sampled streams
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Rectangle-efficient aggregation in spatial data streams
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
International Journal of Sensor Networks
Survey: Streaming techniques and data aggregation in networks of tiny artefacts
Computer Science Review
Sketch-based querying of distributed sliding-window data streams
Proceedings of the VLDB Endowment
Measuring and fingerprinting click-spam in ad networks
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Measuring and fingerprinting click-spam in ad networks
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
Fast large-scale approximate graph construction for NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Sketch algorithms for estimating point queries in NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Lower bounds for quantile estimation in random-order and multi-pass streaming
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
CR-PRECIS: a deterministic summary structure for update data streams
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Improved counter based algorithms for frequent pairs mining in transactional data streams
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Modeling conservative updates in multi-hash approximate count sketches
Proceedings of the 24th International Teletraffic Congress
Range majority in constant time and linear space
Information and Computation
Anomaly extraction in backbone networks using association rules
IEEE/ACM Transactions on Networking (TON)
CS2: a new database synopsis for query estimation
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Scalable identification and measurement of heavy-hitters
Computer Communications
Quantiles over data streams: an experimental study
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Optimizing password composition policies
Proceedings of the fourteenth ACM conference on Electronic commerce
Software defined traffic measurement with OpenSketch
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
High throughput heavy hitter aggregation for modern SIMD processors
Proceedings of the Ninth International Workshop on Data Management on New Hardware
Exact sparse recovery with L0 projections
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Homomorphic fingerprints under misalignments: sketching edit and shift distances
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Making every bit count in wide-area analytics
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Resource/accuracy tradeoffs in software-defined measurement
Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking
ACM Transactions on Database Systems (TODS) - Invited papers issue
Indexing for summary queries: Theory and practice
ACM Transactions on Database Systems (TODS)
Sketching for big data recommender systems using fast pseudo-random fingerprints
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part II
Identifying streaming frequent items in ad hoc time windows
Data & Knowledge Engineering
Federated flow-based approach for privacy preserving connectivity tracking
Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
Sketch-based geometric monitoring of distributed stream queries
Proceedings of the VLDB Endowment
A methodological overview on anomaly detection
DataTraffic Monitoring and Analysis
Indexing Word Sequences for Ranked Retrieval
ACM Transactions on Information Systems (TOIS)
Data summarization for network traffic monitoring
Journal of Network and Computer Applications
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
We introduce a new sublinear space data structure--the count-min sketch--for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition, it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc. The time and space bounds we show for using the CM sketch to solve these problems significantly improve those previously known--typically from 1/ε2 to 1/ε in factor.