Technical Challenges of Supporting Interactive HPC

  • Authors:
  • A. Reuther;J. Kepner;A. McCabe;J. Mullen;N. Bliss;H. Kim

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • HPCMP-UGC '07 Proceedings of the 2007 DoD High Performance Computing Modernization Program Users Group Conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, on- demand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with on- demand, interactive operation, individual users often can also gain access to 20 25% of the cluster processors. This paper will share a variety of the new data on our 1 This work is sponsored by the Department of the Air Force under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government. experiences with running an interactive, on-demand system that also provides some batch queue access. Keywords: grid computing, on-demand, interactive high performance computing, cluster computing, parallel MATLAB.