The Grid Workloads Archive

  • Authors:
  • Alexandru Iosup;Hui Li;Mathieu Jan;Shanny Anoep;Catalin Dumitrescu;Lex Wolters;Dick H. J. Epema

  • Affiliations:
  • Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, The Netherlands;LIACS, University of Leiden, The Netherlands;Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, The Netherlands;Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, The Netherlands;Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, The Netherlands;LIACS, University of Leiden, The Netherlands;Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, The Netherlands

  • Venue:
  • Future Generation Computer Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

While large grids are currently supporting the work of thousands of scientists, very little is known about their actual use. Due to strict organizational permissions, there are few or no traces of grid workloads available to the grid researcher and practitioner. To address this problem, in this work we present the Grid Workloads Archive (GWA), which is at the same time a workload data exchange and a meeting point for the grid community. We define the requirements for building a workload archive, and describe the approach taken to meet these requirements with the GWA. We introduce a format for sharing grid workload information, and tools associated with this format. Using these tools, we collect and analyze data from nine well-known grid environments, with a total content of more than 2000 users submitting more than 7 million jobs over a period of over 13 operational years, and with working environments spanning over 130 sites comprising 10000 resources. We show evidence that grid workloads are very different from those encountered in other large-scale environments, and in particular from the workloads of parallel production environments: they comprise almost exclusively single-node jobs, and jobs arrive in ''bags-of-tasks''. Finally, we present the immediate applications of the GWA and of its content in several critical grid research and practical areas: research in grid resource management, and grid design, operation, and maintenance.