Proceedings of the 21st ACM international conference on Information and knowledge management
From stars to galaxies: skyline queries on aggregate data
Proceedings of the 16th International Conference on Extending Database Technology
Hi-index | 0.00 |
Aggregation is among the core functionalities of OLAP systems. Frequently, such queries are issued in decision support systems to identify interesting groups of data. When more than one aggregation function is involved and the notion of interest is not clearly defined, skyline queries provide a robust mechanism to capture the potentially interesting points where (i) users do not need to specify a ranking function and (ii) the result is independent of the dimension scales. To provide better exploration functionalities in OLAP systems, we propose to use skyline queries over aggregated data to identify the most interesting groups. Since aggregation functions have to be ad-hoc to cover a wide variety of user interests, the skyline over the aggregates has to be computed on the fly. Hence any algorithm to compute such a skyline must be fast and be able to progressively produce the result set with potential skyline groups being produced as early as possible. We explore a family of algorithms which try to consume only as many data records as are necessary to compute the skyline and design an optimal algorithm. We further refine the algorithm by taking into account systems issues such as disk behavior which are often ignored but have strong impact on real system performance. Experimental results validate the performance and progressive benefits of our algorithm.