Active and accelerated learning of cost models for optimizing scientific applications

Authors:
Piyush Shivam;Shivnath Babu;Jeff Chase
Affiliations:
Duke University;Duke University;Duke University
Venue:
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Year:
2006

Citing 19
Cited 9

Probability, statistics, and queueing theory with computer science applications

Probability, statistics, and queueing theory with computer science applications
A static analysis of I/O characteristics of scientific applications in a production workload

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Zoo: a desktop experiment management environment

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Building regression cost models for multidatabase systems

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Sing the truth about ad hoc join costs

The VLDB Journal — The International Journal on Very Large Data Bases
A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Interposed proportional sharing for a storage service utility

Proceedings of the joint international conference on Measurement and modeling of computer systems
Kepler: An Extensible System for Design and Execution of Scientific Workflows

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
New NFS Tracing Tools and Techniques for System Analysis

LISA '03 Proceedings of the 17th USENIX conference on System administration
Using Regression Techniques to Predict Large Data Transfers

International Journal of High Performance Computing Applications
Statistical learning techniques for costing XML queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating databases and workflow systems

ACM SIGMOD Record
How Well Can Simple Metrics Represent the Performance of HPC Applications?

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Explicit control a batch-aware distributed file system

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
GridDB: a data-centric overlay for scientific grids

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Learning Application Models for Utility Resource Planning

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing

Automated and on-demand provisioning of virtual machines for database applications

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A dollar from 15 cents: cross-platform management for internet services

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Large-scale uncertainty management systems: learning and exploiting your data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
SLA-Driven Adaptive Resource Management for Web Applications on a Heterogeneous Compute Cloud

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Adaptive grid resource selection based on job history analysis using Plackett-Burman designs

APNOMS'09 Proceedings of the 12th Asia-Pacific network operations and management conference on Management enabling the future internet for changing business and new computing services
Resource allocation algorithms for virtualized service hosting platforms

Journal of Parallel and Distributed Computing
Automated experiment-driven management of (database) systems

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
CloudScale: elastic resource scaling for multi-tenant cloud systems

Proceedings of the 2nd ACM Symposium on Cloud Computing
Optimizing notifications of subscription-based forecast queries

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the NIMO system that automatically learns cost models for predicting the execution time of computational-science applications running on large-scale networked utilities such as computational grids. Accurate cost models are important for selecting efficient plans for executing these applications on the utility. Computational-science applications are often scripts (written, e.g., in languages like Perl or Matlab) connected using a workflow-description language, and therefore, pose different challenges compared to modeling the execution of plans for declarative queries with well-understood semantics. NIMO generates appropriate training samples for these applications to learn fairly-accurate cost models quickly using statistical learning techniques. NIMO's approach is active and noninvasive: it actively deploys and monitors the application under varying conditions, and obtains its training data from passive instrumentation streams that require no changes to the operating system or applications. Our experiments with real scientific applications demonstrate that NIMO significantly reduces the number of training samples and the time to learn fairly-accurate cost models.