Approximate counting: a detailed analysis
BIT - Ellis Horwood series in artificial intelligence
Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
Randomized algorithms
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Packet classification on multiple fields
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Testing and spot-checking of data streams (extended abstract)
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Trajectory sampling for direct traffic observation
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Counting large numbers of events in small registers
Communications of the ACM
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
New directions in traffic measurement and accounting
IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Controlling High-Bandwidth Flows at the Congested Router
ICNP '01 Proceedings of the Ninth International Conference on Network Protocols
Optimum algorithms for two random sampling problems
SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management
ACM SIGMOD Record
Space-code bloom filter for efficient traffic flow measurement
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Identifying frequent items in sliding windows over on-line packet streams
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Taming the underlying challenges of reliable multihop routing in sensor networks
Proceedings of the 1st international conference on Embedded networked sensor systems
Deterministic sampling and range counting in geometric data streams
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
What's hot and what's not: tracking most frequent items dynamically
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Space complexity of hierarchical heavy hitters in multi-dimensional data streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast and approximate stream mining of quantiles and frequencies using graphics processors
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Using association rules for fraud detection in web advertising networks
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Ranking flows from sampled traffic
CoNEXT '05 Proceedings of the 2005 ACM conference on Emerging network experiment and technology
Maintaining significant stream statistics over sliding windows
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Streaming and sublinear approximation of entropy and information distances
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Research issues in data stream association rule mining
ACM SIGMOD Record
Approximate quantiles and the order of the stream
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding global icebergs over distributed data sets
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
DSM-PLW: single-pass mining of path traversal patterns over streaming web click-sequences
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Mining evolving data streams for frequent patterns
Pattern Recognition
Deterministic sampling and range counting in geometric data streams
ACM Transactions on Algorithms (TALG)
Fast data stream algorithms using associative memories
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
Approximate mining of frequent patterns on streams
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Robust lower bounds for communication and stream computation
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Finding popular categories for RFID tags
Proceedings of the 9th ACM international symposium on Mobile ad hoc networking and computing
Sleeping on the job: energy-efficient and robust broadcast for radio networks
Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
Mining frequent items in a stream using flexible windows
Intelligent Data Analysis - Knowledge Discovery from Data Streams
FIDS: Monitoring Frequent Items over Distributed Data Streams
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Memory Efficient Algorithm for Mining Recent Frequent Items in a Stream
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Separator: Sifting Hierarchical Heavy Hitters Accurately from Data Streams
ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Finding Frequent Items over General Update Streams
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Finding Frequent Items in a Turnstile Data Stream
COCOON '08 Proceedings of the 14th annual international conference on Computing and Combinatorics
Clustering Distributed Sensor Data Streams
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting
Proceedings of the VLDB Endowment
Finding frequent items in data streams
Proceedings of the VLDB Endowment
CAM conscious integrated answering of frequent elements and top-k queries over data streams
Proceedings of the 4th international workshop on Data management on new hardware
ODMCA: An adaptive data mining control algorithm in multicarrier networks
Computer Communications
The design of a query monitoring system
ACM Transactions on Database Systems (TODS)
HIDS: a multifunctional generator of hierarchical data streams
ACM SIGMIS Database
Data Mining and Knowledge Discovery
Space-optimal heavy hitters with strong error bounds
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding the frequent items in streams of data
Communications of the ACM - A View of Parallel Computing
Deterministically Estimating Data Stream Frequencies
COCOA '09 Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications
Sublinear estimation of entropy and information distances
ACM Transactions on Algorithms (TALG)
The Frequent Items Problem, under Polynomial Decay, in the Streaming Model
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Online Evaluation of Patterns from Evolving Web Data Streams
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Methods for finding frequent items in data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Finding frequent items over sliding windows with constant update time
Information Processing Letters
Sampling-based stream mining for network risk management
JSAI'06 Proceedings of the 20th annual conference on New frontiers in artificial intelligence
Discovering correlated items in data streams
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Sketching information divergences
COLT'07 Proceedings of the 20th annual conference on Learning theory
Finding frequent elements in non-bursty streams
ESA'07 Proceedings of the 15th annual European conference on Algorithms
Finding frequent items in data streams using ESBF
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Supporting top-k aggregate queries over unequal synopsis on internet traffic streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Finding heavy hitters over the sliding window of a weighted data stream
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Lower bounds on frequency estimation of data streams
CSR'08 Proceedings of the 3rd international conference on Computer science: theory and applications
The frequent items problem, under polynomial decay, in the streaming model
Theoretical Computer Science
Mining top-k frequent items in a data stream with flexible sliding windows
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining discriminative items in multiple data streams
World Wide Web
Space-optimal heavy hitters with strong error bounds
ACM Transactions on Database Systems (TODS)
Finding top-k elements in data streams
Information Sciences: an International Journal
Estimating top-k destinations in data streams
IPMU'10 Proceedings of the Computational intelligence for knowledge-based systems design, and 13th international conference on Information processing and management of uncertainty
Private and continual release of statistics
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
An Ω(1/ε log 1/ε) space lower bound for finding ε-approximate quantiles in a data stream
FAW'10 Proceedings of the 4th international conference on Frontiers in algorithmics
Parallelizing weighted frequency counting in high-speed network monitoring
Computer Communications
Result enrichment in commerce search using browse trails
Proceedings of the fourth ACM international conference on Web search and data mining
Clustering distributed sensor data streams using local processing and reduced communication
Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Distributed frequent items detection on uncertain data
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis
Data Mining and Knowledge Discovery
Mining approximate frequent closed flows over packet streams
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Data-driven modeling and analysis of online social networks
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Private and Continual Release of Statistics
ACM Transactions on Information and System Security (TISSEC)
Finding frequent elements in compressed 2D arrays and strings
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Finding longest increasing and common subsequences in streaming data
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Error-adaptive and time-aware maintenance of frequency counts over data streams
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Approximate range mode and range median queries
STACS'05 Proceedings of the 22nd annual conference on Theoretical Aspects of Computer Science
Adaptive spatial partitioning for multidimensional data streams
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
False-Negative frequent items mining from data streams with bursting
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Approximating frequent items in asynchronous data stream over a sliding window
WAOA'09 Proceedings of the 7th international conference on Approximation and Online Algorithms
A false negative approach to mining frequent itemsets from high speed transactional data streams
Information Sciences: an International Journal
Dynamic range majority data structures
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
A randomized algorithm for finding frequent elements in streams using o(loglogn) space
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Sketching and streaming algorithms for processing massive data
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Differentially private continual monitoring of heavy hitters from distributed streams
PETS'12 Proceedings of the 12th international conference on Privacy Enhancing Technologies
DWFIST: leveraging calendar-based pattern mining in data streams
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Lower bounds for quantile estimation in random-order and multi-pass streaming
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
CR-PRECIS: a deterministic summary structure for update data streams
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Improved counter based algorithms for frequent pairs mining in transactional data streams
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Size matters: finding the most informative set of window lengths
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Per-flow traffic measurement through randomized counter sharing
IEEE/ACM Transactions on Networking (TON)
Scalable identification and measurement of heavy-hitters
Computer Communications
Simple and deterministic matrix sketching
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Spreader classification based on optimal dynamic bit sharing
IEEE/ACM Transactions on Networking (TON)
Better space bounds for parameterized range majority and minority
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Mining frequent itemsets in a stream
Information Systems
Mining frequent items in data stream using time fading model
Information Sciences: an International Journal
Hi-index | 0.00 |
We consider a router on the Internet analyzing the statistical properties of a TCP/IP packet stream. A fundamental difficulty with measuring traffic behavior on the Internet is that there is simply too much data to be recorded for later analysis, on the order of gigabytes a second. As a result, network routers can collect only relatively few statistics about the data. The central problem addressed here is to use the limited memory of routers to determine essential features of the network traffic stream. A particularly difficult and representative subproblem is to determine the top k categories to which the most packets belong, for a desired value of k and for a given notion of categorization such as the destination IP address.We present an algorithm that deterministically finds (in particular) all categories having a frequency above 1/(m+1) using m counters, which we prove is best possible in the worst case. We also present a sampling-based algorithm for the case that packet categories follow an arbitrary distribution, but their order over time is permuted uniformly at random. Under this model, our algorithm identifies flows above a frequency threshold of roughly 1/驴nm with high probability, where m is the number of counters and n is the number of packets observed. This guarantee is not far off from the ideal of identifying all flows (probability 1/n), and we prove that it is best possible up to a logarithmic factor. We show that the algorithm ranks the identified flows according to frequency within any desired constant factor of accuracy.