IEEE Transactions on Pattern Analysis and Machine Intelligence
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Architectural requirements of parallel scientific applications with explicit communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An Analytical Model for Designing Memory Hierarchies
IEEE Transactions on Computers
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
A nationwide parallel computing environment
Communications of the ACM
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
ACM Computing Surveys (CSUR)
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Operating Systems Theory
Distributed Edge Detection: Issues and Implementations
IEEE Computational Science & Engineering
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Performance Characterization of the Pentium® Pro Processor
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Home-Based SVM Protocols for SMP Clusters: Design and Performance
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Configuration Independent Analysis for Characterizing Shared-Memory Applications
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Efficient Memory Page Replacement on Web Server Clusters
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Predicting and Evaluating Distributed Communication Performance
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
System Support to Balance the Resource Supply and Demand in High-end Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Analytical Modeling of Communication Latency in Multi-Cluster Systems
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
A performance model for analysis of heterogeneous multi-cluster systems
Parallel Computing
Analytical modeling of interconnection networks in heterogeneous multi-cluster systems
The Journal of Supercomputing
Analytical communication networks model for enterprise Grid computing
Future Generation Computer Systems
Communication network analysis of the enterprise grid systems
ACSW '07 Proceedings of the fifth Australasian symposium on ACSW frontiers - Volume 68
Performance modeling and analysis of heterogeneous meta-computing systems interconnection networks
Computers and Electrical Engineering
Multi-cluster computing interconnection network performance modeling and analysis
Future Generation Computer Systems
Self-configuring algorithm for software fault tolerance in (n,k)-way cluster systems
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Performance analysis of interconnection networks for multi-cluster systems
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part III
Study of a cluster-based parallel system through analytical modeling and simulation
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
Hi-index | 14.99 |
Using off-the-shelf commodity workstations and PCs to build a cluster for parallel computing has become a common practice. The cost-effectiveness of a cluster computing platform for a given budget and for certain types of applications is mainly determined by its memory hierarchy and the interconnection network configurations of the cluster. Finding such a cost-effective solution from exhaustive simulations would be highly time-consuming and predictions from measurements on existing clusters would be impractical. We present an analytical model for evaluating the performance impact of memory hierarchies and networks on cluster computing. The model covers the memory hierarchy of a single SMP, a cluster of workstations/PCs, or a cluster of SMPs by changing various architectural parameters. Network variations covering both bus and switch networks are also included in the analysis. Different types of applications are characterized by parameterized workloads with different computation and communication requirements. The model has been validated by simulations and measurements. The workloads used for experiments are both scientific applications and commercial workloads. Our study shows that the depth of the memory hierarchy is the most sensitive factor affecting the execution time for many types of workloads. However, the interconnection network cost of a tightly coupled system with a short depth in memory hierarchy, such as an SMP, is significantly more expensive than a normal cluster network connecting independent computer nodes. Thus, the essential issue to be considered is the trade-off between the depth of the memory hierarchy and the system cost. Based on analyses and case studies, we present our quantitative recommendations for building cost-effective clusters for different workloads.