Ranking objects based on relationships and fixed associations

Authors:
Albert Angel;Surajit Chaudhuri;Gautam Das;Nick Koudas
Affiliations:
University of Toronto;Microsoft Research;University of Texas at Arlington;University of Toronto
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 13
Cited 5

Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Supporting ad-hoc ranking aggregates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Ranking objects based on relationships

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Adaptive rank-aware query optimization in relational databases

ACM Transactions on Database Systems (TODS)
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Depth estimation for ranking query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
BlogScope: a system for online analysis of high volume text streams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Ad-hoc aggregations of ranked lists in the presence of hierarchies

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Evaluating rank joins with optimal cost

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Breaking out of the box of recommendations: from items to packages

Proceedings of the fourth ACM conference on Recommender systems
Efficient diversity-aware search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Diversified ranking on large graphs: an optimization viewpoint

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
TEXplorer: keyword-based object search and exploration in multidimensional text databases

Proceedings of the 20th ACM international conference on Information and knowledge management
On the complexity of package recommendation problems

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text corpora are often enhanced by additional metadata which relate real-world entities, with each document in which such entities are discussed. Such relationships are typically obtained through widely available Information Extraction tools. At the same time, interesting known associations typically hold among these entities. For instance, a corpus might contain discussions on hotels, cities and airlines; fixed associations among these entities may include: airline A operates a flight to city C, hotel H is located in city C. A plethora of applications necessitate the identification of associated entities, each best matching a given set of keywords. Consider the sample query: Find a holiday package in a "pet-friendly" hotel, located in a "historical" yet "lively" city, with travel operated by an "economical" and "safe" airline. These keywords are unlikely to occur in the textual description of entities themselves, (e.g., the actual hotel name or the city name or the airline name). Consequently to answer such queries, one needs to exploit both relationships between entities and documents (e.g., keyword "pet-friendly" occurs in a document that contains an entity specifying a hotel name H), and the known associations between entities (e.g., hotel H is located in city C). In this work, we focus on the class of "entity package finder" queries outlined above. We demonstrate that existing techniques cannot be efficiently adapted to solve this problem, as the resulting algorithm relies on estimations with excessive runtime and/or storage overheads. We propose an efficient algorithm to process such queries, over large corpora. We devise early pruning and termination strategies, in the presence of joins and aggregations (executed on entities extracted from text), that do not depend on any estimates. Our analysis and experimental evaluation on real and synthetic data demonstrates the efficiency and scalability of our approach.