Proximity rank join

Authors:
Davide Martinenghi;Marco Tagliasacchi
Affiliations:
Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 13
Cited 12

Incremental distance join algorithms for spatial databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Algorithms for processing K-closest-pair queries in spatial databases

Data & Knowledge Engineering
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
Processing Distance Join Queries with Constraints

The Computer Journal
Depth estimation for ranking query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Evaluating rank joins with optimal cost

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Efficient search for the top-k probable nearest neighbors in uncertain databases

Proceedings of the VLDB Endowment
Weighted Proximity Best-Joins for Information Retrieval

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Confidence-Aware Join Algorithms

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Robust and efficient algorithms for rank join evaluation

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Search Computing: challenges and Directions

Search Computing: challenges and Directions

Exploratory search in multi-domain information spaces with liquid query

Proceedings of the 20th international conference companion on World wide web
Proximity rank join in search computing

Search computing
Trends in rank join

Search computing
Search computing: multi-domain search on ranked data

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Parallel data access for multiway rank joins

ICWE'11 Proceedings of the 11th international conference on Web engineering
Diversification for multi-domain result sets

Proceedings of the 20th ACM international conference on Information and knowledge management
Proximity measures for rank join

ACM Transactions on Database Systems (TODS)
Top-k bounded diversification

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Micro-specialization: dynamic code specialization of database management systems

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Merging multiple information sources in federated sponsored search auctions

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Diversification for multi-domain result sets

ICWE'12 Proceedings of the 12th international conference on Web Engineering
Top-k diversity queries over bounded regions

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce the proximity rank join problem, where we are given a set of relations whose tuples are equipped with a score and a real-valued feature vector. Given a target feature vector, the goal is to return the K combinations of tuples with high scores that are as close as possible to the target and to each other, according to some notion of distance. The setting closely resembles that of traditional rank join, but the geometry of the vector space plays a distinctive role in the computation of the overall score of a combination. Also, the input relations typically return their results either by distance from the target or by score. Because of these aspects, it turns out that traditional rank join algorithms, such as the well-known HRJN, have shortcomings in solving the proximity rank join problem, as they may read more input than needed. To overcome this weakness, we define a tight bound (used as a stopping criterion) that guarantees instance optimality, i.e., an I/O cost is achieved that is always within a constant factor of optimal. The tight bound can also be used to drive an adaptive pulling strategy, deciding at each step which relation to access next. For practically relevant classes of problems, we show how to compute the tight bound efficiently. An extensive experimental study validates our results and demonstrates significant gains over existing solutions.