The LRU-K page replacement algorithm for database disk buffering
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Cello: a disk scheduling framework for next generation operating systems
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The SDSS skyserver: public access to the sloan digital sky server data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient execution of multiple query workloads in data analysis applications
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Query Processing in Tertiary Memory Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Relational Joins for Data on Tertiary Storage
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Optimal File-Bundle Caching Algorithms for Data-Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Batch is Back: CasJobs, Serving Multi-TB Data on the Web
ICWS '05 Proceedings of the IEEE International Conference on Web Services
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scheduling shared scans of large data files
Proceedings of the VLDB Endowment
A high performance system for processing queries on distributed geospatial data sets
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
CoScan: cooperative scan sharing in the cloud
Proceedings of the 2nd ACM Symposium on Cloud Computing
I/O streaming evaluation of batch queries for data-intensive computational turbulence
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
TRACON: interference-aware scheduling for data-intensive applications in virtualized environments
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
We present JAWS, a job-aware, data-driven batch scheduler that improves query throughput for data-intensive scientific database clusters. As datasets reach petabyte-scale, workloads that scan through vast amounts of data to extract features are gaining importance in the sciences. However, acute performance bottlenecks result when multiple queries execute simultaneously and compete for I/O resources. Our solution, JAWS, divides queries into I/O-friendly sub-queries for scheduling. It then identifies overlapping data requirements within the workload and executes sub-queries in batches to maximize data sharing and reduce redundant I/O. JAWS extends our previous work by supporting workflows in which queries exhibit data dependencies, exploiting workload knowledge to coordinate caching decisions, and combating starvation through adaptive and incremental trade-offs between query throughput and response time. Instrumenting JAWS in the Turbulence Database Cluster yields nearly three-fold improvement in query throughput when contention in the workload is high.