Multidimensional binary search trees used for associative searching
Communications of the ACM
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
DynDex: a dynamic and non-metric space indexer
Proceedings of the tenth ACM international conference on Multimedia
Efficient Progressive Skyline Computation
Proceedings of the 27th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
An optimal and progressive algorithm for skyline queries
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Progressive skyline computation in database systems
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Maximal vector computation in large data sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
SUBSKY: Efficient Computation of Skylines in Subspaces
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
IO-Top-k: index-access optimized top-k query processing
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Shooting stars in the sky: an online algorithm for skyline queries
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient online top-K retrieval with arbitrary similarity measures
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A clustering based approach for skyline diversity
Expert Systems with Applications: An International Journal
Hi-index | 0.01 |
A skyline query returns a set of objects that are not dominated by other objects. An object is said to dominate another if it is closer to the query than the latter on all factors under consideration. In this paper, we consider the case where the similarity measures may be arbitrary and do not necessarily come from a metric space. We first explore middleware algorithms, analyze how skyline retrieval for non-metric spaces can be done on the middleware backend, and lay down a necessary and sufficient stopping condition for middleware-based skyline algorithms. We develop the Balanced Access Algorithm, which is provably more IO-friendly than the state-of-the-art algorithm for skyline query processing on middleware and show that BAA outperforms the latter by orders of magnitude. We also show that without prior knowledge about data distributions, it is unlikely to have a middleware algorithm that is more IO-friendly than BAA. In fact, we empirically show that BAA is very close to the absolute lower bound of IO costs for middleware algorithms. Further, we explore the non-middleware setting and devise an online algorithm for skyline retrieval which uses a recently proposed value space index over non-metric spaces (AL-Tree [10]). The AL-Tree based algorithm is able to prune subspaces and efficiently maintain candidate sets leading to better performance. We compare our algorithms to existing ones which can work with arbitrary similarity measures and show that our approaches are better in terms of computational and disk access costs leading to significantly better response times.