Avalon: an Alpha/Linux cluster achieves 10 Gflops for $15k
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
A personal supercomputer for climate research
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel ADI solver based on processor scheduling
Applied Mathematics and Computation
From Toys to Teraflops: Bridging the Beowulf Gap
International Journal of High Performance Computing Applications
Beowulf performance in CFD multigrid applications
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Hi-index | 0.00 |
Beowulf-class systems employ inexpensive commodity processors, open source operating systems and communication libraries and commodity networking hardware to deliver supercomputer performance at the lowest possible price. Small to medium sized Beowulf systems are installed or planned at dozens of universities, laboratories and industrial sites around the world. The design space for larger systems, however, is largely unexplored.We investigate two interconnection techniques that would allow the scaling of a Beowulf class system to unprecedented sizes (hundreds or thousands of processors). The first is a commercially available high-speed backplane with fast-ethernet and gigabit ethernet switch modules. The second is a network that combines point-to-point connections and commodity switches, employing the Beowulf nodes as IP routers as well as computational engines. The first method has the drawback of a relatively high price tag - a significant but not overwhelming fraction of the cost of the system must be devoted to the switch. The second approach is inexpensive, but introduces extra latency and may restrict bandwidth depending on the topology employed. Our tests are designed to quantify these effects.We perform three sets of tests. Low-level tests measure the hardware performance of the individual components, i.e., bandwidths and latencies through various combinations of network interface cards, node-based routers and switches. System-level tests measure the bisection bandwidth and the ability of the system to communicate and compute simultaneously, under conditions designed to generate contention and heavy loads. Finally, application benchmarks measure the performance delivered to a subset of the NAS parallel benchmark suite.Low-level measurements indicate that latencies are only weakly affected by introduction of node-based routers, and that bandwidths are completely unaffected. Switches introduce between 19 and 30 microseconds of latency and linux-based routers introduce 75 microseconds.System level measurements indicate that even under load, the routed network delivers a bisection bandwidth near what one would expect by simply summing the capacities of the individual components. A very large scatter is introduced into latencies, however.Application level measurements reveal that the routed network is acceptable for some applications, but can have significant deleterious effect on others. Successful use of routed networks for very large beowulf systems will requires a deeper understanding of this issue.