Adaptive Request Scheduling for Parallel Scientific Web Services

Authors:
Heshan Lin;Xiaosong Ma;Jiangtian Li;Ting Yu;Nagiza Samatova
Affiliations:
Department of Computer Science, North Carolina State University,;Department of Computer Science, North Carolina State University, and Computer Science and Mathematic Division, Oak Ridge National Laboratory, ,;Department of Computer Science, North Carolina State University,;Department of Computer Science, North Carolina State University,;Department of Computer Science, North Carolina State University, and Computer Science and Mathematic Division, Oak Ridge National Laboratory, ,
Venue:
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Year:
2008

Citing 12
Cited 0

Robust partitioning policies of multiprocessor systems

Performance Evaluation - Special issue: performance modeling of parallel processing systems
Locality-aware request distribution in cluster-based network servers

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Scheduling optimization for resource-intensive Web requests on server clusters

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Performance of adaptive space sharing processor allocation policies for distributed-memory multicomputers

Journal of Parallel and Distributed Computing
Parallelization of local BLAST service on workstation clusters

Future Generation Computer Systems
Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling

IEEE Transactions on Parallel and Distributed Systems
The state of the art in locally distributed Web-server systems

ACM Computing Surveys (CSUR)
TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Efficient Data Access for Parallel BLAST

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Databases of Discovery

Queue - Databases
A BLAST Service Built on Data Indexed Overlay Network

E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific web services often possess data models and query workloads quite different from commercial ones and are much less studied. Individual queries have to be processed in parallel by multiple server nodes, due to the computation- and data-intensiveness of the processing. Meanwhile, each query is performed against portions of a large, common dataset. Existing scheduling policies from traditional environments (namely cluster web servers and supercomputers) consider only the data or the computation aspect alone and are therefore inadequate for this new type of workload.In this paper, we systematically investigate adaptive scheduling for scientific web services, by taking into account parallel computation scalability, data locality, and load balancing. Our case study focuses on high-throughput query processing on biological sequence databases, a fundamental task performed daily by millions of scientists, who increasingly prefer to use web services powered by parallel servers. Our research indicates that intelligent resource allocation and scheduling are crucial in improving the overall performance of a parallel sequence database search server. Failure to consider either the parallel computation scalability or the data locality issues can significantly hurt the system throughput and query response time. Also, no single static strategy works best for all request workloads or all resources settings. In response, we present several dynamic scheduling techniques that automatically adapt to the request workload and system configuration in making scheduling decisions. Experiments on a cluster using 32 processors show the combination of these techniques delivers a several-fold improvement in average query response time across various workloads.