Ten thousand SQLs: parallel keyword queries computing

Authors:
Lu Qin;Jeffrey Xu Yu;Lijun Chang
Affiliations:
The Chinese University of Hong Kong;The Chinese University of Hong Kong;The Chinese University of Hong Kong
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 32
Cited 2

Distributed query processing

ACM Computing Surveys (CSUR)
Join processing in relational databases

ACM Computing Surveys (CSUR)
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Principles of database query processing for advanced applications

Principles of database query processing for advanced applications
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient and extensible algorithms for multi query optimization

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Query Processing in Parallel Relational Database Systems

Query Processing in Parallel Relational Database Systems
DBXplorer: A System for Keyword-Based Search over Relational Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding and approximating top-k answers in keyword proximity search

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Effective keyword search in relational databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Spark: top-k keyword query in relational databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient exploitation of similar subexpressions for query processing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Keyword search on relational data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Improving Static Task Scheduling in Heterogeneous and Homogeneous Computing Systems

ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Authority-based keyword search in databases

ACM Transactions on Database Systems (TODS)
EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Keyword proximity search in complex data graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scalable multi-query optimization for exploratory queries over federated scientific databases

Proceedings of the VLDB Endowment
Keyword search on external memory data graphs

Proceedings of the VLDB Endowment
Scalable Keyword Search on Large Data Streams

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Parallel Skyline Computation on Multicore Architectures

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Querying Communities in Relational Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Compaction of Schedules and a Two-Stage Approach for Duplication-Based DAG Scheduling

IEEE Transactions on Parallel and Distributed Systems
Keyword search in databases: the power of RDBMS

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Mining tree-structured data on multicore systems

Proceedings of the VLDB Endowment
Keyword Search in Databases

Keyword Search in Databases

Index structures and top-k join algorithms for native keyword search databases

Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient Top-k Keyword Search Over Multidimensional Databases

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword search in relational databases has been extensively studied. Given a relational database, a keyword query finds a set of interconnected tuple structures connected by foreign key references. On rdbms, a keyword query is processed in two steps, namely, candidate networks (CNs) generation and CNs evaluation, where a CN is an sql. In common, a keyword query needs to be processed using over 10,000 sqls. There are several approaches to process a keyword query on rdbms, but there is a limit to achieve high performance on a uniprocessor architecture. In this paper, we study parallel computing keyword queries on a multicore architecture. We give three observations on keyword query computing, namely, a large number of sqls that needs to be processed, high sharing possibility among sqls, and large intermediate results with small number of final results. All make it challenging for parallel keyword queries computing. We investigate three approaches. We first study the query level parallelism, where each sql is processed by one core. We distribute the sqls into different cores based on three objectives, regarding minimizing workload skew, minimizing intercore sharing and maximizing intra-core sharing respectively. Such an approach has the potential risk of load unbalancing through accumulating errors of cost estimation. We then study the operation level parallelism, where each operation of an sql is processed by one core. All operations are processed in stages, where in each stage the costs of operations are re-estimated to reduce the accumulated error. Such operation level parallelism still has drawbacks of workload skew when large operations are involved and a large number of cores are used. Finally, we propose a new algorithm that partitions relations adaptively in order to minimize the extra cost of partitioning and at the same time reduce workload skew. We conducted extensive performance studies using two large real datasets, DBLP and IMDB, and we report the efficiency of our approaches in this paper.