Analyzing memory access intensity in parallel programs on multicore
Proceedings of the 22nd annual international conference on Supercomputing
Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
WARPP: a toolkit for simulating high-performance parallel scientific codes
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Multi-channel video-based parallel fire detection acceleration method using multi-cores
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Overview of Multicore Requirements towards Real-Time Communication
SEUS '09 Proceedings of the 7th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems
A compiler-automated array compression scheme for optimizing memory intensive programs
Proceedings of the 24th ACM International Conference on Supercomputing
Optimizing a parallel runtime system for multicore clusters: a case study
Proceedings of the 2010 TeraGrid Conference
mPlogP: A Parallel Computation Model for Heterogeneous Multi-core Computer
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Setting up a high performance computing cluster: a case study at the university of Craiova
ACELAE'11 Proceedings of the 10th WSEAS international conference on communications, electrical & computer engineering, and 9th WSEAS international conference on Applied electromagnetics, wireless and optical communications
Gang scheduling in multi-core clusters implementing migrations
Future Generation Computer Systems
On the evaluation of java symphony for heterogeneous multi-core clusters
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
DNA sequence alignment: hybrid parallel programming on a multicore cluster
ICANCM'11/ICDCC'11 Proceedings of the 2011 international conference on applied, numerical and computational mathematics, and Proceedings of the 2011 international conference on Computers, digital communications and computing
Performance evaluation of a reservoir simulator on a multi-core cluster
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part IV
Performance analysis and optimization of MPI collective operations on multi-core clusters
The Journal of Supercomputing
Parallel processing for stepwise generalisation method on multi-core PC cluster
International Journal of Knowledge and Web Intelligence
The Journal of Supercomputing
The Experience in Designing and Evaluating the High Performance Cluster Netuno
International Journal of Parallel Programming
Hi-index | 0.01 |
Multi-core processors are growing as a new industry trend as single core processors rapidly reach the physical limits of possible complexity and speed. In the new Top500 supercomputer list, more than 20% processors belong to the multi-core processor family. However, without an indepth study on application behaviors and trends on multicore clusters, we might not be able to understand the characteristics of multi-core cluster in a comprehensive manner and hence not be able to get optimal performance. In this paper, we take on these challenges and design a set of experiments to study the impact of multi-core architecture on cluster computing. We choose to use one of the most advanced multi-core servers, Intel Bensley system with Woodcrest processors, as our evaluation platform, and use benchmarks including HPL, NAMD, and NAS as the applications to study. From our message distribution experiments, we find that on an average about 50% messages are transferred through intra-node communication, which is much higher than intuition. This trend indicates that optimizing intranode communication is as important as optimizing internode communication in a multi-core cluster. We also observe that cache and memory contention may be a potential bottleneck in multi-core clusters, and communication middleware and applications should be multi-core aware to alleviate this problem. We demonstrate that multi-core aware algorithm, e.g. data tiling, improves benchmark execution time by up to 70%. We also compare the scalability of a multi-core cluster with that of a single-core cluster and find that the scalability of the multi-core cluster is promising.