Finding hierarchical heavy hitters in data streams

Authors:
Graham Cormode;Flip Korn;S. Muthukrishnan;Divesh Srivastava
Affiliations:
Rutgers University;AT&T Labs-Research;AT&T Labs-Research;AT&T Labs-Research
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 23
Cited 48

Approximate medians and other quantiles in one pass and with limited memory

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random sampling techniques for space efficient online computation of order statistics of large datasets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On network-aware clustering of Web clients

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Deriving traffic demands for operational IP networks: methodology and experience

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Gigascope: high performance network monitoring with an SQL interface

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Controlling high bandwidth aggregates in the network

ACM SIGCOMM Computer Communication Review
New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Data streams: algorithms and applications

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
MULTOPS: a data-structure for bandwidth attack detection

SSYM'01 Proceedings of the 10th conference on USENIX Security Symposium - Volume 10
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The generalized MDL approach for summarization

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Quotient cube: how to summarize the semantics of a data cube

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Reversible sketches for efficient and accurate change detection over network data streams

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Framework and algorithms for trend analysis in massive temporal data sets

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Maintaining Implicated Statistics in Constrained Environments

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Duplicate detection in click streams

WWW '05 Proceedings of the 14th international conference on World Wide Web
Space complexity of hierarchical heavy hitters in multi-dimensional data streams

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Profiling internet backbone traffic: behavior models and applications

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
What's new: finding significant differences in network data streams

IEEE/ACM Transactions on Networking (TON)
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams

IEEE Transactions on Knowledge and Data Engineering
An integrated efficient solution for computing frequent and top-k elements in data streams

ACM Transactions on Database Systems (TODS)
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Efficient and effective explanation of change in hierarchical summaries

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Answering ad hoc aggregate queries from data streams using prefix aggregate trees

Knowledge and Information Systems
Reversible sketches: enabling monitoring and analysis over high-speed data streams

IEEE/ACM Transactions on Networking (TON)
High-speed detection of unsolicited bulk emails

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Finding hierarchical heavy hitters in streaming data

ACM Transactions on Knowledge Discovery from Data (TKDD)
A scalable sampling scheme for clustering in network traffic analysis

Proceedings of the 2nd international conference on Scalable information systems
Using 2D Hierarchical Heavy Hitters to Investigate Binary Relationships

Visual Data Mining
Separator: Sifting Hierarchical Heavy Hitters Accurately from Data Streams

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Analysis and Interpretation of Visual Hierarchical Heavy Hitters of Binary Relations

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting

Proceedings of the VLDB Endowment
Mining frequent closed itemsets from a landmark window over online data streams

Computers & Mathematics with Applications
Internet traffic behavior profiling for network security monitoring

IEEE/ACM Transactions on Networking (TON)
Mining non-derivable frequent itemsets over data stream

Data & Knowledge Engineering
Space-optimal heavy hitters with strong error bounds

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Towards automated performance diagnosis in a large IPTV network

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Enhancing the B+-tree by dynamic node popularity caching

Information Processing Letters
Aggregate computation over data streams

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Critical infrastructure protection: Resource efficient sampling to improve detection of less frequent patterns in network traffic

Journal of Network and Computer Applications
HiFIND: A high-speed flow-level intrusion detection approach with DoS resiliency

Computer Networks: The International Journal of Computer and Telecommunications Networking
Space-optimal heavy hitters with strong error bounds

ACM Transactions on Database Systems (TODS)
Sequential hashing: A flexible approach for unveiling significant patterns in high speed networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
Towards adjusting mobile devices to user's behaviour

MSM'10/MUSE'10 Proceedings of the 2010 international conference on Analysis of social media and ubiquitous data
Leveraging Zipf's law for traffic offloading

ACM SIGCOMM Computer Communication Review
Dynamically mining frequent patterns over online data streams

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Efficient computation of frequent and top-k elements in data streams

ICDT'05 Proceedings of the 10th international conference on Database Theory
Adaptive spatial partitioning for multidimensional data streams

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
A false negative maximal frequent itemset mining algorithm over stream

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Detecting anomalies in backbone network traffic: a performance comparison among several change detection methods

International Journal of Sensor Networks
Approximate answers to OLAP queries on streaming data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
CR-PRECIS: a deterministic summary structure for update data streams

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Optimizing adaptive multi-route query processing via time-partitioned indices

Journal of Computer and System Sciences
Resource/accuracy tradeoffs in software-defined measurement

Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking
Automated signature extraction for high volume attacks

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
A methodological overview on anomaly detection

DataTraffic Monitoring and Analysis
FaRNet: Fast recognition of high-dimensional patterns from big network traffic data

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aggregation along hierarchies is a critical summary technique in a large variety of on-line applications including decision support and network management (e.g., IP clustering, denial-of-service attack monitoring). Despite the amount of recent study that has been dedicated to online aggregation on sets (e.g., quantiles, hot items), surprisingly little attention has been paid to summarizing hierarchical structure in stream data. The problem we study in this paper is that of finding Hierarchical Heavy Hitters (HHH): given a hierarchy and a fraction φ, we want to find all HHH nodes that have a total number of descendants in the data stream no smaller than φ of the total number of elements in the data stream, after discounting the descendant nodes that are HHH nodes. The resulting summary gives a topological "cartogram" of the hierarchical data. We present deterministic and randomized algorithms for finding HHHs, which builds upon existing techniques by incorporating the hierarchy into the algorithms. Our experiments demonstrate several factors of improvement in accuracy over the straightforward approach, which is due to making algorithms hierarchy-aware.