The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
New directions in traffic measurement and accounting
IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
Automatically inferring patterns of resource consumption in network traffic
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Finding hierarchical heavy hitters in network measurement system
Proceedings of the 2007 ACM symposium on Applied computing
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Visual Data Mining: An Introduction and Overview
Visual Data Mining
Using 2D Hierarchical Heavy Hitters to Investigate Binary Relationships
Visual Data Mining
Separator: Sifting Hierarchical Heavy Hitters Accurately from Data Streams
ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Analysis and Interpretation of Visual Hierarchical Heavy Hitters of Binary Relations
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
HIDS: a multifunctional generator of hierarchical data streams
ACM SIGMIS Database
Space-optimal heavy hitters with strong error bounds
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-optimal heavy hitters with strong error bounds
ACM Transactions on Database Systems (TODS)
CR-PRECIS: a deterministic summary structure for update data streams
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
FaRNet: Fast recognition of high-dimensional patterns from big network traffic data
Computer Networks: The International Journal of Computer and Telecommunications Networking
Hi-index | 0.00 |
Heavy hitters, which are items occurring with frequency above a given threshold, are an important aggregation and summary tool when processing data streams or data warehouses. Hierarchical heavy hitters (HHHs) have been introduced as a natural generalization for hierarchical data domains, including multi-dimensional data. An item x in a hierarchy is called a φ-HHH if its frequency after discounting the frequencies of all its descendant hierarchical heavy hitters exceeds φn, where φ is a user-specified parameter and n is the size of the data set. Recently, single-pass schemes have been proposed for computing φ-HHHs using space roughly O(1/φ log(φn)). The frequency estimates of these algorithms, however, hold only for the total frequencies of items, and not the discounted frequencies; this leads to false positives because the discounted frequency can be significantly smaller than the total frequency. This paper attempts to explain the difficulty of finding hierarchical heavy hitters with better accuracy. We show that a single-pass deterministic scheme that computes φ-HHHs in a d-dimensional hierarchy with any approximation guarantee must use Ω(1/φd+1) space. This bound is tight: in fact, we present a data stream algorithm that can report the φ-HHHs without false positives in O(1/φd+1) space.