Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Combining fuzzy information from multiple systems
Journal of Computer and System Sciences
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Top-k selection queries over relational databases: Mapping strategies and performance evaluation
ACM Transactions on Database Systems (TODS)
Optimizing Multi-Feature Queries for Image Databases
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient top-K query calculation in distributed networks
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Supporting top-k join queries in relational databases
The VLDB Journal — The International Journal on Very Large Data Bases
RankSQL: query algebra and optimization for relational top-k queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
KLEE: a framework for distributed top-k query algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Type less, find more: fast autocompletion search with a succinct index
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Answering top-k queries with multi-dimensional selections: the ranking cube approach
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Top-k query evaluation with probabilistic guarantees
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
Database Systems: The Complete Book
Database Systems: The Complete Book
P-Cube: Answering Preference Queries in Multi-Dimensional Space
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A workload-driven unit of cache replacement for mid-tier database caching
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
DBpedia: a nucleus for a web of open data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space
Schism: a workload-driven approach to database replication and partitioning
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Picasso - to sing, you must close your eyes and draw
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Top-K color queries for document retrieval
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Hi-index | 0.00 |
Focusing on the top-K items according to a ranking criterion constitutes an important functionality in many different query answering scenarios. The idea is to read only the necessary information---mostly from secondary storage---with the ultimate goal to achieve low latency. In this work, we consider processing such top-K queries under the constraint that the result items are members of a specific set, which is provided at query time. We call this restriction a set-defined selection criterion. Set-defined selections drastically influence the pros and cons of an id-ordered index vs. a score-ordered index. We present a mathematical model that allows to decide at runtime which index to choose, leading to a combined index. To improve the latency around the break even point of the two indices, we show how to benefit from a partitioned score-ordered index and present an algorithm to create such partitions based on analyzing query logs. Further performance gains can be enjoyed using approximate top-K results, with tunable result quality. The presented approaches are evaluated using both real-world and synthetic data.