Mining performance data for metascheduling decision support in the grid

  • Authors:
  • Hui Li;David Groep;Lex Wolters

  • Affiliations:
  • Leiden Institute of Advanced Computer Science (LIACS), Leiden University, CA, Leiden, The Netherlands;National Institute for Nuclear and High Energy Physics (NIKHEF), DB, Amsterdam, The Netherlands;Leiden Institute of Advanced Computer Science (LIACS), Leiden University, CA, Leiden, The Netherlands

  • Venue:
  • Future Generation Computer Systems - Special section: Data mining in grid computing environments
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Metaschedulers in the Grid need dynamic information to support their scheduling decisions. Job response time on computing resources, for instance, is such a performance metric. In this paper, we propose an Instance Based Learning technique to predict response times by mining historical performance data. The novelty of our approach is to introduce policy attributes in representing and comparing resource states, which are defined as the pools of running and queued jobs on the resources at the time of making predictions. The policy attributes reflect the local scheduling policies and they can be automatically discovered using genetic search. An extensive empirical evaluation is conducted to validate our technique using real workload traces, which are collected from the NIKHEF production cluster on the LHC Computing Grid and Blue Horizon in the San Diego Supercomputer Center (SDSC). The experimental results show that acceptable prediction accuracy can be achieved, where the normalized average prediction errors for response times are ranging from 0.57 to 0.79.