Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Wide area traffic: the failure of Poisson modeling
IEEE/ACM Transactions on Networking (TON)
Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Progressive approximate aggregate queries with a multi-resolution tree structure
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Analysis of pre-computed partition top method for range top-k queries in OLAP data cubes
Proceedings of the eleventh international conference on Information and knowledge management
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining Top.K Frequent Closed Patterns without Minimum Support
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Evaluating Top-k Queries over Web-Accessible Databases
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Supporting ad-hoc ranking aggregates
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient top-k aggregation of ranked inputs
ACM Transactions on Database Systems (TODS)
Supporting top-K join queries in relational databases
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Evaluation of top-k OLAP queries using aggregate r–trees
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Optimal top-k generation of attribute combinations based on ranked lists
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
A thin monitoring layer for top-k aggregation queries over a database
Proceedings of the 7th International Workshop on Ranking in Databases
Hi-index | 0.00 |
We study an important data analysis operator, which extracts the k most important groups from data (i.e., the k groups with the highest aggregate values). In a data warehousing context, an example of the above query is ''find the 10 combinations of product-type and month with the largest sum of sales''. The problem is challenging as the potential number of groups can be much larger than the memory capacity. We propose on-demand methods for efficient top-k groups processing, under limited memory size. In particular, we design top-k groups retrieval techniques for three representative scenarios as follows. For the scenario with data physically ordered by measure, we propose the write-optimized multi-pass sorted access algorithm (WMSA), that exploits available memory for efficient top-k groups computation. Regarding the scenario with unordered data, we develop the recursive hash algorithm (RHA), which applies hashing with early aggregation, coupled with branch-and-bound techniques and derivation heuristics for tight score bounds of hash partitions. Next, we design the clustered groups algorithm (CGA), which accelerates top-k groups processing for the case where data is clustered by a subset of group-by attributes. Extensive experiments with real and synthetic datasets demonstrate the applicability and efficiency of the proposed algorithms.