The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations
Communications of the ACM
Range queries in OLAP data cubes
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Partial-sum queries in OLAP data cubes using covering codes
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Computing the median with uncertainty
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Using wavelet decomposition to support progressive and approximate range-sum queries over data cubes
Proceedings of the ninth international conference on Information and knowledge management
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Novel Algorithms for Computing Medians and Other Quantiles of Disk-Resident Data
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
Selection Algorithms for Parallel Disk Systems
HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Journal of Computer and System Sciences
Hi-index | 0.00 |
In this paper we give a new efficient approach called dynamic bucketing to query and cluster very large data sets, a topic of great theoretical and practical significance. We partition data into equal-width buckets and further partition dense buckets into sub-buckets as needed by efficiently reclaiming and allocating memory space. The bucketing process dynamically adapts to the input order and distribution of input data sets. We propose a new data structure called the structure trees for storing aggregation information such as histograms of the buckets and sub-buckets. Since we only store the count values, the data structure is highly aggregated, concise, and very suitable for data sets with a large number of records. We also provide new query evaluation and data clustering algorithms based on the structure trees. Our simulation results show that our approach is superior to the current leading approaches in terms of the accuracy and performance.