Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Extending OpenMP to Support Slipstream Execution Mode
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 19th annual international conference on Supercomputing
SimICS/sun4m: a virtual workstation
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Evaluation of OpenMP for the cyclops multithreaded architecture
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
CMP Cache Architecture and the OpenMP Performance
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Exploiting multithreaded architectures to improve the hash join operation
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Streams: emerging from a shared memory model
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Performance evaluation of intel's quad core processors for embedded applications
WSEAS Transactions on Computers
Performance evaluation of OpenMP benchmarks on intel's quad core processors
ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume I
A multi-core numerical framework for characterizing flow in oil reservoirs
Proceedings of the 19th High Performance Computing Symposia
L2-Cache hierarchical organizations for multi-core architectures
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Performance evaluation of a reservoir simulator on a multi-core cluster
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part IV
How OpenMP applications get more benefit from many-core era
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Performance analysis and optimization of MPI collective operations on multi-core clusters
The Journal of Supercomputing
Hi-index | 0.00 |
Multiprocessors based on simultaneous multithreaded (SMT) or multicore (CMP) processors are continuing to gain a significant share in both high-performance and mainstream computing markets. In this paper we evaluate the performance of OpenMP applications on these two parallel architectures. We use detailed hardware metrics to identify architectural bottlenecks. We find that the high level of resource sharing in SMTs results in performance complications, should more than 1 thread be assigned on a single physical processor. CMPs, on the other hand, are an attractive alternative. Our results show that the exploitation of the multiple processor cores on each chip results in significant performance benefits. We evaluate an adaptive, run-time mechanism which provides limited performance improvements on SMTs, however the inherent bottlenecks remain difficult to overcome. We conclude that out-of-the-box OpenMP code scales better on CMPs than SMTs. To maximize the efficiency of OpenMP on SMTs, new capabilities are required by the runtime environment and/or the programming interface.