A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
IEEE Transactions on Parallel and Distributed Systems
Efficient Locally Weighted Polynomial Regression Predictions
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Historical Application Profiler for Use by Parallel Schedulers
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
On the design of a demand-based network-computing system: the purdue university network-computing hubs
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Predicting application run times with historical information
Journal of Parallel and Distributed Computing
Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Backfilling Using System-Generated Predictions Rather than User Runtime Estimates
IEEE Transactions on Parallel and Distributed Systems
PQR: Predicting Query Execution Times for Autonomous Workload Management
ICAC '08 Proceedings of the 2008 International Conference on Autonomic Computing
Using Templates to Predict Execution Time of Scientific Workflow Applications in the Grid
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
A Hybrid Intelligent Method for Performance Modeling and Prediction of Workflow Activities in Grids
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Autonomic Resource Management with Support Vector Machines
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Computing resource prediction for mapreduce applications using decision tree
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Automatic selection of processing units for coprocessing in databases
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Dynamic resource allocation for cloud-based media processing
Proceeding of the 23rd ACM Workshop on Network and Operating Systems Support for Digital Audio and Video
Exploring portfolio scheduling for long-term execution of scientific workloads in IaaS clouds
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient co-processor utilization in database query processing
Information Systems
Execution time prediction for grid infrastructures based on runtime provenance data
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
Most data centers, clouds and grids consist of multiple generations of computing systems, each with different performance profiles, posing a challenge to job schedulers in achieving the best usage of the infrastructure. A useful piece of information for scheduling jobs, typically not available, is the extent to which applications will use available resources once they are executed. This paper comparatively assesses the suitability of several machine learning techniques for predicting spatio temporal utilization of resources by applications. Modern machine learning techniques able to handle large number of attributes are used, taking into account application- and system-specific attributes (e.g., CPU micro architecture, size and speed of memory and storage, input data characteristics and input parameters). The work also extends an existing classification tree algorithm, called Predicting Query Runtime (PQR), to the regression problem by allowing the leaves of the tree to select the best regression method for each collection of data on leaves. The new method (PQR2) yields the best average percentage error, predicting execution time, memory and disk consumption for two bioinformatics applications, BLAST and RAxML, deployed on scenarios that differ in system and usage. In specific scenarios where usage is a non-linear function of system and application attributes, certain configurations of two other machine learning algorithms, Support Vector Machine and k-nearest neighbors, also yield competitive results. In addition, experiments show that the inclusion of system performance and application-specific attributes also improves the performance of machine learning algorithms investigated.