Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal histograms for hierarchical range queries (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast algorithms for hierarchical range histogram construction
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Gigascope: high performance network monitoring with an SQL interface
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast prefix matching of bounded strings
Journal of Experimental Algorithmics (JEA)
Deterministic wavelet thresholding for maximum-error metrics
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space efficiency in synopsis construction algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
One-pass wavelet synopses for maximum-error metrics
VLDB '05 Proceedings of the 31st international conference on Very large data bases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient and effective explanation of change in hierarchical summaries
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical synopses with optimal error guarantees
ACM Transactions on Database Systems (TODS)
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Fast and effective histogram construction
Proceedings of the 18th ACM conference on Information and knowledge management
Optimality and scalability in lattice histogram construction
Proceedings of the VLDB Endowment
Facilitating discovery on the private web using dataset digests
International Journal of Metadata, Semantics and Ontologies
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Hi-index | 0.00 |
Distributed monitoring applications often involve streams of unique identifiers (UIDs) such as IP addresses or RFID tag IDs. An important class of query for such applications involves partitioning the UIDs into groups using a large lookup table; the query then performs aggregation over the groups. We propose using histograms to reduce bandwidth utilization in such settings, using a histogram partitioning function as a compact representation of the lookup table. We investigate methods for constructing histogram partitioning functions for lookup tables over unique identifiers that form a hierarchy of contiguous groups, as is the case with network addresses and several other types of UID. Each bucket in our histograms corresponds to a subtree of the hierarchy. We develop three novel classes of partitioning functions for this domain, which vary in their structure, construction time, and estimation accuracy.Our approach provides several advantages over previous work. We show that optimal instances of our partitioning functions can be constructed efficiently from large lookup tables. The partitioning functions are also compact, with each partition represented by a single identifier. Finally, our algorithms support minimizing any error metric that can be expressed as a distributive aggregate; and they extend naturally to multiple hierarchical dimensions. In experiments on real-world network monitoring data, we show that our histograms provide significantly higher accuracy per bit than existing techniques.