Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Top-k selection queries over relational databases: Mapping strategies and performance evaluation
ACM Transactions on Database Systems (TODS)
Comparing and aggregating rankings with ties
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
Best-Effort Top-k Query Processing Under Budgetary Constraints
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient similarity search: arbitrary similarity measures, arbitrary composition
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Efficient top-k retrieval of records from a database has been an active research field for many years. We approach the problem from a real-world application point of view, in which the order of records according to some similarity function on an attribute is not unique: Many records have same values in several attributes and thus their ranking in those attributes is arbitrary. For instance, in large person databases many individuals have the same first name, the same date of birth, or live in the same city. Existing algorithms, such as the Threshold Algorithm (TA), are ill-equipped to handle such cases efficiently. We introduce a variation of TA, the Bulk Sorted Access Algorithm (BSA), which retrieves larger chunks of records from the sorted lists using fixed thresholds, and which focusses its efforts on records that are ranked high in more than one ordering and are thus more promising candidates. We experimentally show that our method outperforms TA and another previous method for top-k retrieval in those very common cases.