Sharing work in keyword search over databases

Authors:
Marie Jacob;Zachary Ives
Affiliations:
University Of Pennsylvania, Philadelphia, PA, USA;University Of Pennsylvania, Philadelphia, PA, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 30
Cited 3

Multiple-query optimization

ACM Transactions on Database Systems (TODS)
Referral Web: combining social networks and collaborative filtering

Communications of the ACM
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient and extensible algorithms for multi query optimization

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Rate-based query optimization for streaming information sources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Pipeline N-Way Join Algorithm Based on the 2-Way Semijoin Program

IEEE Transactions on Knowledge and Data Engineering
The Volcano Optimizer Generator: Extensibility and Efficient Search

Proceedings of the Ninth International Conference on Data Engineering
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Text joins in an RDBMS for web data integration

WWW '03 Proceedings of the 12th international conference on World Wide Web
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Algorithms and applications for answering ranked queries using ranked views

The VLDB Journal — The International Journal on Very Large Data Bases
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Context-sensitive information retrieval using implicit feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Adaptive rank-aware query optimization in relational databases

ACM Transactions on Database Systems (TODS)
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient exploitation of similar subexpressions for query processing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Depth estimation for ranking query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Adaptive query processing

Foundations and Trends in Databases
Evaluating rank joins with optimal cost

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Context-aware query suggestion by mining click-through and session data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to create data-integrating queries

Proceedings of the VLDB Endowment
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Automatically incorporating new sources in keyword search-based data integration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Processing a large number of continuous preference top-k queries

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Actively soliciting feedback for query answers in keyword search-based data integration

Proceedings of the VLDB Endowment
Database Keyword Search: A Perspective from Optimization

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important means of allowing non-expert end-users to pose ad hoc queries whether over single databases or data integration systems is through keyword search. Given a set of keywords, the query processor finds matches across different tuples and tables. It computes and executes a set of relational sub-queries whose results are combined to produce the k highest ranking answers. Work on keyword search primarily focuses on single-database, single-query settings: each query is answered in isolation, despite possible overlap between queries posed by different users or at different times; and the number of relevant tables is assumed to be small, meaning that sub-queries can be processed without using cost-based methods to combine work. As we apply keyword search to support ad hoc data integration queries over scientific or other databases on the Web, we must reuse and combine computation. In this paper, we propose an architecture that continuously receives sets of ranked keyword queries, and seeks to reuse work across these queries. We extend multiple query optimization and continuous query techniques, and develop a new query plan scheduling module we call the ATC (based on its analogy to an air traffic controller). The ATC manages the flow of tuples among a multitude of pipelined operators, minimizing the work needed to return the top-k answers for all queries. We also develop techniques to manage the sharing and reuse of state as queries complete and input data streams are exhausted. We show the effectiveness of our techniques in handling queries over real and synthetic data sets.