Probability, statistics, and queueing theory with computer science applications
Probability, statistics, and queueing theory with computer science applications
A static analysis of I/O characteristics of scientific applications in a production workload
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Zoo: a desktop experiment management environment
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Building regression cost models for multidatabase systems
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Sing the truth about ad hoc join costs
The VLDB Journal — The International Journal on Very Large Data Bases
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The Grid 2: Blueprint for a New Computing Infrastructure
The Grid 2: Blueprint for a New Computing Infrastructure
Interposed proportional sharing for a storage service utility
Proceedings of the joint international conference on Measurement and modeling of computer systems
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
New NFS Tracing Tools and Techniques for System Analysis
LISA '03 Proceedings of the 17th USENIX conference on System administration
Using Regression Techniques to Predict Large Data Transfers
International Journal of High Performance Computing Applications
Statistical learning techniques for costing XML queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating databases and workflow systems
ACM SIGMOD Record
How Well Can Simple Metrics Represent the Performance of HPC Applications?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Explicit control a batch-aware distributed file system
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
GridDB: a data-centric overlay for scientific grids
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Learning Application Models for Utility Resource Planning
ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Automated and on-demand provisioning of virtual machines for database applications
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A dollar from 15 cents: cross-platform management for internet services
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Large-scale uncertainty management systems: learning and exploiting your data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
SLA-Driven Adaptive Resource Management for Web Applications on a Heterogeneous Compute Cloud
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Adaptive grid resource selection based on job history analysis using Plackett-Burman designs
APNOMS'09 Proceedings of the 12th Asia-Pacific network operations and management conference on Management enabling the future internet for changing business and new computing services
Resource allocation algorithms for virtualized service hosting platforms
Journal of Parallel and Distributed Computing
Automated experiment-driven management of (database) systems
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
CloudScale: elastic resource scaling for multi-tenant cloud systems
Proceedings of the 2nd ACM Symposium on Cloud Computing
Optimizing notifications of subscription-based forecast queries
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
We present the NIMO system that automatically learns cost models for predicting the execution time of computational-science applications running on large-scale networked utilities such as computational grids. Accurate cost models are important for selecting efficient plans for executing these applications on the utility. Computational-science applications are often scripts (written, e.g., in languages like Perl or Matlab) connected using a workflow-description language, and therefore, pose different challenges compared to modeling the execution of plans for declarative queries with well-understood semantics. NIMO generates appropriate training samples for these applications to learn fairly-accurate cost models quickly using statistical learning techniques. NIMO's approach is active and noninvasive: it actively deploys and monitors the application under varying conditions, and obtains its training data from passive instrumentation streams that require no changes to the operating system or applications. Our experiments with real scientific applications demonstrate that NIMO significantly reduces the number of training samples and the time to learn fairly-accurate cost models.