ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable

Authors:
Ting Chen;Kenjiro Taura
Affiliations:
-;-
Venue:
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Year:
2012

Citing 15
Cited 1

Parallel database systems: the future of high performance database systems

Communications of the ACM
Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
On parallel processing of aggregate and scalar functions in object-relational DBMS

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of queries with user-defined predicates

ACM Transactions on Database Systems (TODS)
Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
User-Defined Table Operators: Enhancing Extensibility for ORDBMS

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Workflows and e-Science: An overview of workflow system features and capabilities

Future Generation Computer Systems
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment

Implementation of data affinity-based distributed parallel processing on a distributed key value store

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes extensions to parallel database systems called collective queries and User-Defined eXecutables (UDX). A collective query is an SQL query whose results are distributed to multiple clients and then processed by them in parallel, using arbitrary external programs (user-defined executables). The intended applications are data intensive work-flows, typically built out of various independently developed executables and scripts. Collective queries facilitate description of such workflows by making data parallel execution of external programs on big data easy and streamlined. It also provides the workflow developers with a familiar and powerful language SQL, for flexible data filtering and stereotypical data processing tasks. We implement this concept in a system "ParaLite", a parallel database system based on a popular lightweight database SQ Lite. It equips with data transfer optimization algorithms that distribute query results to multiple clients, taking both communication cost and compute loads into account. We verified the correctness and performance of Para Lite and the experimental results show that Para Lite has good performance on SQL processing and achieves good scalability for the parallelization of UDX.