Groupware: some issues and experiences
Communications of the ACM
MPICH2: A New Start for MPI Implementations
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Sun Grid Engine: Towards Creating a Compute Power Grid
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
NPACI Rocks: Tools and Techniques for Easily Deploying Manageable Linux Clusters
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Zimbra collaboration suite, Version 4.5
Linux Journal
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Nagios: System and Network Monitoring
Nagios: System and Network Monitoring
ZK Step-By-Step: Ajax without JavaScript Framework
ZK Step-By-Step: Ajax without JavaScript Framework
Hi-index | 0.00 |
High Performance Computing (HPC) is becoming much more popular nowadays. Currently, the biggest supercomputers in the world have hundreds of thousands of processors and consequently may have more software and hardware failures. HPC centers managers also have to deal with multiple clusters from different vendors with their particular architectures. However, since there are not enough HPC experts to manage all the new supercomputers, it is expected that non-experts will be managing those large clusters. In this paper we study the new challenges to manage HPC environments containing different clusters with different sizes and architectures. We review available tools and present LEMMing [1], an easy-to-use open source tool developed to support high performance computing centers. LEMMing integrates machine resources and the available management and monitoring tools on a single point of management.