JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations

Authors:
Xiaodan Wang;Eric Perlman;Randal Burns;Tanu Malik;Tamas Budavári;Charles Meneveau;Alexander Szalay
Affiliations:
-;-;-;-;-;-;-
Venue:
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2010

Citing 16
Cited 4

The LRU-K page replacement algorithm for database disk buffering

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Caching strategies to improve disk system performance

Computer
Cello: a disk scheduling framework for next generation operating systems

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The SDSS skyserver: public access to the sloan digital sky server data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient execution of multiple query workloads in data analysis applications

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Query Processing in Tertiary Memory Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Query Pre-Execution and Batching in Paradise: A Two-Pronged Approach to the Efficient Processing of Queries on Tape-Resident Raster Images

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Relational Joins for Data on Tertiary Storage

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Optimal File-Bundle Caching Algorithms for Data-Grids

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Batch is Back: CasJobs, Serving Multi-TB Data on the Web

ICWS '05 Proceedings of the IEEE International Conference on Web Services
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scheduling shared scans of large data files

Proceedings of the VLDB Endowment
A high performance system for processing queries on distributed geospatial data sets

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science

Scientific data management at the Johns Hopkins institute for data intensive engineering and science

ACM SIGMOD Record
CoScan: cooperative scan sharing in the cloud

Proceedings of the 2nd ACM Symposium on Cloud Computing
I/O streaming evaluation of batch queries for data-intensive computational turbulence

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
TRACON: interference-aware scheduling for data-intensive applications in virtualized environments

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present JAWS, a job-aware, data-driven batch scheduler that improves query throughput for data-intensive scientific database clusters. As datasets reach petabyte-scale, workloads that scan through vast amounts of data to extract features are gaining importance in the sciences. However, acute performance bottlenecks result when multiple queries execute simultaneously and compete for I/O resources. Our solution, JAWS, divides queries into I/O-friendly sub-queries for scheduling. It then identifies overlapping data requirements within the workload and executes sub-queries in batches to maximize data sharing and reduce redundant I/O. JAWS extends our previous work by supporting workflows in which queries exhibit data dependencies, exploiting workload knowledge to coordinate caching decisions, and combating starvation through adaptive and incremental trade-offs between query throughput and response time. Instrumenting JAWS in the Turbulence Database Cluster yields nearly three-fold improvement in query throughput when contention in the workload is high.