Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
Multi-dimensional selectivity estimation using compressed histogram information
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Iceberg-cube computation with PC clusters
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Exploiting hierarchical domain structure to compute similarity
ACM Transactions on Information Systems (TOIS)
On the Computation of Multidimensional Aggregates
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Automatically inferring patterns of resource consumption in network traffic
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Space complexity of hierarchical heavy hitters in multi-dimensional data streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The generalized MDL approach for summarization
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Data Streaming with Affinity Propagation
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Anomaly extraction in backbone networks using association rules
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
A heuristic method of finding heavy hitter prefix pairs in IP traffic
IEEE Communications Letters
Online measurement of large traffic aggregates on commodity switches
Hot-ICE'11 Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services
Structure-aware sampling on data streams
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Structure-aware sampling on data streams
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Towards adjusting mobile devices to user's behaviour
MSM'10/MUSE'10 Proceedings of the 2010 international conference on Analysis of social media and ubiquitous data
Anomaly extraction in backbone networks using association rules
IEEE/ACM Transactions on Networking (TON)
Software defined traffic measurement with OpenSketch
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
FaRNet: Fast recognition of high-dimensional patterns from big network traffic data
Computer Networks: The International Journal of Computer and Telecommunications Networking
Hi-index | 0.00 |
Data items that arrive online as streams typically have attributes which take values from one or more hierarchies (time and geographic location, source and destination IP addresses, etc.). Providing an aggregate view of such data is important for summarization, visualization, and analysis. We develop an aggregate view based on certain organized sets of large-valued regions (“heavy hitters”) corresponding to hierarchically discounted frequency counts. We formally define the notion of hierarchical heavy hitters (HHHs). We first consider computing (approximate) HHHs over a data stream drawn from a single hierarchical attribute. We formalize the problem and give deterministic algorithms to find them in a single pass over the input. In order to analyze a wider range of realistic data streams (e.g., from IP traffic-monitoring applications), we generalize this problem to multiple dimensions. Here, the semantics of HHHs are more complex, since a “child” node can have multiple “parent” nodes. We present online algorithms that find approximate HHHs in one pass, with provable accuracy guarantees. The product of hierarchical dimensions forms a mathematical lattice structure. Our algorithms exploit this structure, and so are able to track approximate HHHs using only a small, fixed number of statistics per stored item, regardless of the number of dimensions. We show experimentally, using real data, that our proposed algorithms yields outputs which are very similar (virtually identical, in many cases) to offline computations of the exact solutions, whereas straightforward heavy-hitters-based approaches give significantly inferior answer quality. Furthermore, the proposed algorithms result in an order of magnitude savings in data structure size while performing competitively.