Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
Multi-dimensional selectivity estimation using compressed histogram information
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Automatically inferring patterns of resource consumption in network traffic
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The generalized MDL approach for summarization
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
Space complexity of hierarchical heavy hitters in multi-dimensional data streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Detecting malicious network traffic using inverse distributions of packet contents
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
The hunting of the bump: on maximizing statistical discrepancy
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Finding hierarchical heavy hitters in network measurement system
Proceedings of the 2007 ACM symposium on Applied computing
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A scalable sampling scheme for clustering in network traffic analysis
Proceedings of the 2nd international conference on Scalable information systems
Finding frequent items in probabilistic data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Using 2D Hierarchical Heavy Hitters to Investigate Binary Relationships
Visual Data Mining
Analysis and Interpretation of Visual Hierarchical Heavy Hitters of Binary Relations
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Pruning attribute values from data cubes with diamond dicing
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Relaxation in text search using taxonomies
Proceedings of the VLDB Endowment
SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting
Proceedings of the VLDB Endowment
HIDS: a multifunctional generator of hierarchical data streams
ACM SIGMIS Database
Towards automated performance diagnosis in a large IPTV network
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
URCA: pulling out anomalies by their root causes
INFOCOM'10 Proceedings of the 29th conference on Information communications
Detecting the performance impact of upgrades in large operational networks
Proceedings of the ACM SIGCOMM 2010 conference
International Journal of Network Management
Distributed frequent items detection on uncertain data
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Space-efficient tracking of persistent items in a massive data stream
Proceedings of the 5th ACM international conference on Distributed event-based system
Towards adjusting mobile devices to user's behaviour
MSM'10/MUSE'10 Proceedings of the 2010 international conference on Analysis of social media and ubiquitous data
Streaming Solutions for Fine-Grained Network Traffic Measurements and Analysis
Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
Traffic modeling and classification using packet train length and packet train size
IPOM'06 Proceedings of the 6th IEEE international conference on IP Operations and Management
Adapting OLAP analysis to the user's interest through virtual cubes
FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Optimal source-based filtering of malicious traffic
IEEE/ACM Transactions on Networking (TON)
Optimizing adaptive multi-route query processing via time-partitioned indices
Journal of Computer and System Sciences
FaRNet: Fast recognition of high-dimensional patterns from big network traffic data
Computer Networks: The International Journal of Computer and Telecommunications Networking
Adtributor: revenue debugging in advertising systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Data items archived in data warehouses or those that arrive online as streams typically have attributes which take values from multiple hierarchies (e.g., time and geographic location; source and destination IP addresses). Providing an aggregate view of such data is important to summarize, visualize, and analyze. We develop the aggregate view based on certain hierarchically organized sets of large-valued regions ("heavy hitters"). Such Hierarchical Heavy Hitters (HHHs) were previously introduced as a crucial aggregation technique in one dimension. In order to analyze the wider range of data warehousing applications and realistic IP data streams, we generalize this problem to multiple dimensions.We identify and study two variants of HHHs for multi-dimensional data, namely the "overlap" and "split" cases, depending on how an aggregate computed for a child node in the multi-dimensional hierarchy is propagated to its parent element(s). For data warehousing applications, we present offline algorithms that take multiple passes over the data and produce the exact HHHs. For data stream applications, we present online algorithms that find approximate HHHs in one pass, with provable accuracy guarantees.We show experimentally, using real and synthetic data, that our proposed online algorithms yield outputs which are very similar (virtually identical, in many cases) to their offline counterparts. The lattice property of the product of hierarchical dimensions ("diamond") is crucially exploited in our online algorithms to track approximate HHHs using only a small, fixed number of statistics per candidate node, regardless of the number of dimensions.