Space complexity of hierarchical heavy hitters in multi-dimensional data streams

Authors:
John Hershberger;Nisheeth Shrivastava;Subhash Suri;Csaba D. Tóth
Affiliations:
Mentor Graphics Corp., Wilsonville, OR;University of California at Santa Barbara, Santa Barbara, CA;University of California at Santa Barbara, Santa Barbara, CA;MIT, Cambridge, MA
Venue:
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2005

Citing 11
Cited 14

The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
New directions in traffic measurement and accounting

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Automatically inferring patterns of resource consumption in network traffic

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximate counts and quantiles over sliding windows

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Finding hierarchical heavy hitters in network measurement system

Proceedings of the 2007 ACM symposium on Applied computing
Finding hierarchical heavy hitters in streaming data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Visual Data Mining: An Introduction and Overview

Visual Data Mining
Using 2D Hierarchical Heavy Hitters to Investigate Binary Relationships

Visual Data Mining
Separator: Sifting Hierarchical Heavy Hitters Accurately from Data Streams

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Analysis and Interpretation of Visual Hierarchical Heavy Hitters of Binary Relations

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
HIDS: a multifunctional generator of hierarchical data streams

ACM SIGMIS Database
Space-optimal heavy hitters with strong error bounds

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Aggregate computation over data streams

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Fast Manhattan sketches in data streams

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-optimal heavy hitters with strong error bounds

ACM Transactions on Database Systems (TODS)
CR-PRECIS: a deterministic summary structure for update data streams

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
FaRNet: Fast recognition of high-dimensional patterns from big network traffic data

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heavy hitters, which are items occurring with frequency above a given threshold, are an important aggregation and summary tool when processing data streams or data warehouses. Hierarchical heavy hitters (HHHs) have been introduced as a natural generalization for hierarchical data domains, including multi-dimensional data. An item x in a hierarchy is called a φ-HHH if its frequency after discounting the frequencies of all its descendant hierarchical heavy hitters exceeds φn, where φ is a user-specified parameter and n is the size of the data set. Recently, single-pass schemes have been proposed for computing φ-HHHs using space roughly O(1/φ log(φn)). The frequency estimates of these algorithms, however, hold only for the total frequencies of items, and not the discounted frequencies; this leads to false positives because the discounted frequency can be significantly smaller than the total frequency. This paper attempts to explain the difficulty of finding hierarchical heavy hitters with better accuracy. We show that a single-pass deterministic scheme that computes φ-HHHs in a d-dimensional hierarchy with any approximation guarantee must use Ω(1/φd+1) space. This bound is tight: in fact, we present a data stream algorithm that can report the φ-HHHs without false positives in O(1/φd+1) space.