Communications of the ACM - Blueprint for the future of high-performance networking
A middleware architecture for distributed systems management
Journal of Parallel and Distributed Computing - Special issue on middleware
Configuring Large High-Performance Clusters at Lightspeed: A Case Study
International Journal of High Performance Computing Applications
YellowRiver: A Flexible High Performance Cluster Computing Service for Grid
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Queue - DNS
Extending clusters to Amazon EC2 using the Rocks toolkit
International Journal of High Performance Computing Applications
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
Clusters have made the jump from lab prototypes to full-fledged production computing platforms. The number, variety, and specialized configurations of these machines are increasing dramatically with 32 驴 128 node clusters being commonplace in science labs. The evolving nature of the platform is to target generic PC hardware to specialized functions such as login, compute, web server, file server, and a visualization engine. This is the logical extension to the standard login/compute dichotomy of traditional Beowulf clusters. Clearly, these specialized nodes (hence-forth "cluster appliances") share an immense amount of common configuration and software. What is lacking in many clustering toolkits is the ability to share configuration across appliances and specific hardware (where it should be shared) and differentiate only where needed. In the NPACI Rocks cluster distribution, we have developed a configuration infrastructure with well-defined inheritance properties that leverages and builds on de facto standards including: XML (with standard parsers), RedHat Kickstart, HTTP transport, CGI, SQL databases, and graph constructs to easily define cluster appliances. Our approach neither resorts to replication of configuration files nor does it requirebuilding a "golden" image reference. By relying on this descriptive and programmatic infrastructure and carefully demarking configuration information from the software packages(which is a bit delivery mechanism), we can easily handle the heterogeneity of appliances, easily deal with small hardware differences among particular instances of appliances (such as IDE vs. SCSI), and support large hardware differences (like x86 vs. IA64) with the same infrastructure. Our mechanism is easily extended to other descriptive infrastructures (such as Solaris Jumpstart as a backend target) and has been proven on over a 100 clusters (withsignificant hardware and configuration differences among these clusters).