Ensemble dispatching on an IBM Blue Gene/L for a bioinformatics knowledge environment

  • Authors:
  • Paul Marshall;Matthew Woitaszek;Henry M. Tufo;Rob Knight;Daniel McDonald;Julia Goodrich

  • Affiliations:
  • University of Colorado at Boulder, Boulder, CO;National Center for Atmospheric Research, Boulder, CO;University of Colorado at Boulder, Boulder, CO;University of Colorado at Boulder, Boulder, CO;University of Colorado at Boulder, Boulder, CO;University of Colorado at Boulder, Boulder, CO

  • Venue:
  • Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses our work providing support for processing a large number of short tasks within the context of our development of a collaborative bioinformatics knowledge environment for structural biologists, environmental microbiologists, and evolutionary biologists. We have designed and implemented a new ensemble-based task dispatching system that we have deployed on a Blue Gene/L system in conjunction with the Blue Gene's High Throughput Computing (HTC) capability. Unlike our prior general database-backed HTC task dispatching system, the ensemble-based task dispatching system is able to efficiently process and dispatch large numbers of very short tasks to over a thousand cores. We also investigate the scalability of the IBM Blue Gene/L at HTC in general, identifying and eliminating processor-reboot inefficincies for very short tasks for specific applications, making the Blue Gene/L a feasible processing system for this bioinformatics workload.