Resource monitoring and management with OVIS to enable HPC in cloud computing environments

Authors:
Jim Brandt;Ann Gentile;Jackson Mayo;Philippe Pebay;Diana Roe;David Thompson;Matthew Wong
Affiliations:
Sandia National Laboratories, MS 9159, P.O. Box 969, Livermore, CA 94551 U.S.A.;Sandia National Laboratories, MS 9152, P.O. Box 969, Livermore, CA 94551 U.S.A.;Sandia National Laboratories, MS 9159, P.O. Box 969, Livermore, CA 94551 U.S.A.;Sandia National Laboratories, MS 9159, P.O. Box 969, Livermore, CA 94551 U.S.A.;Sandia National Laboratories, MS 9152, P.O. Box 969, Livermore, CA 94551 U.S.A.;Sandia National Laboratories, MS 9159, P.O. Box 969, Livermore, CA 94551 U.S.A.;Sandia National Laboratories, MS 9152, P.O. Box 969, Livermore, CA 94551 U.S.A.
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 5

Flexible and efficient resource location in large-scale systems

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Integrated data placement and task assignment for scientific workflows in clouds

Proceedings of the fourth international workshop on Data-intensive distributed computing
Program Ultra-Dispatcher for launching applications in a customization manner on cloud computing

Journal of Network and Computer Applications
A case for dual stack virtualization: consolidating HPC and commodity applications in the cloud

Proceedings of the Third ACM Symposium on Cloud Computing
Developing a power measurement framework for cyber defense

Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using the cloud computing paradigm, a host of companies promise to make huge compute resources available to users on a pay-as-you-go basis. These resources can be configured on the fly to provide the hardware and operating system of choice to the customer on a large scale. While the current target market for these resources in the commercial space is web development/hosting, this model has the lure of savings of ownership, operation, and maintenance costs, and thus sounds like an attractive solution for people who currently invest millions to hundreds of millions of dollars annually on High Performance Computing (HPC) platforms in order to support large-scale scientific simulation codes. Given the current interconnect bandwidth and topologies utilized in these commercial offerings, however, the only current viable market in HPC would be small-memory-footprint embarrassingly parallel or loosely coupled applications, which inherently require little to no inter-processor communication. While providing the appropriate resources (bandwidth, latency, memory, etc.) for the HPC community would increase the potential to enable HPC in cloud environments, this would not address the need for scalability and reliability, crucial to HPC applications. Providing for these needs is particularly difficult in commercial cloud offerings where the number of virtual resources can far outstrip the number of physical resources, the resources are shared among many users, and the resources may be heterogeneous. Advanced resource monitoring, analysis, and configuration tools can help address these issues, since they bring the ability to dynamically provide and respond to information about the platform and application state and would enable more appropriate, efficient, and flexible use of the resources key to enabling HPC. Additionally such tools could be of benefit to non-HPC cloud providers, users, and applications by providing more efficient resource utilization in general.