RAID: high-performance, reliable secondary storage
ACM Computing Surveys (CSUR)
The Vision of Autonomic Computing
Computer
NPACI Rocks: Tools and Techniques for Easily Deploying Manageable Linux Clusters
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
OSCAR and the Beowulf Arms Race for the "Cluster Standard"
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
Scalable Cluster Administration " Chiba City I Approach and Lessons Learned
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
The dawning of the autonomic computing era
IBM Systems Journal
The design methodology of Phoenix cluster system software stack
CHINA HPC '07 Proceedings of the 2007 Asian technology information program's (ATIP's) 3rd workshop on High performance computing in China: solution approaches to impediments for high performance computing
A strategy-proof combinatorial auction-based grid resource allocation system
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
High-performance Computing in China: Research and Applications
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
High-Performance clusters are rapidly becoming an important computing platform for both scientific and business applications. To fulfill the new demands and challenges, cluster system software is inevitably complex. Even for experienced administrators, the management of a cluster system is an exhausting job. This paper introduces Fire Phoenix, a scalable and self-managing cluster system software that supports both scientific and commercial applications. With the self-configuring and self-healing features, much of the machine configuration and error recovery can be done automatically. Our design has been proven effective in the operations of the Dawning 4000A supercomputer, which is the biggest cluster system in China.