Performance evaluation of interthread communicationmechanisms on multicore/multithreaded architectures

Authors:
Davide Pasetto;Massimiliano Meneghin;Hubertus Franke;Fabrizio Petrini;Jimi Xenidis
Affiliations:
IBM Research, Dublin, Ireland;IBM Research, Dublin, Ireland;IBM Research, New York, USA;IBM Research, New York, USA;IBM Research, Austin, USA
Venue:
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Year:
2012

Citing 3
Cited 1

A protocol for wait-free, atomic, multi-reader shared variables

PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
A performance evaluation of lock-free synchronization protocols

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Introduction to the wire-speed processor and architecture

IBM Journal of Research and Development

Low latency energy efficient communications in global-scale cloud computing systems

Proceedings of the 2013 workshop on Energy efficient high performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The three major solutions for increasing the nominal performance of a CPU are: multiplying the number of cores per socket, expanding the embedded cache memories and use multi-threading to reduce the impact of the deep memory hierarchy. Systems with tens or hundreds of hardware threads, all sharing a cache coherent UMA or NUMA memory space, are today the de-facto standard. While these solutions can easily provide benefits in a multi-program environment, they require recoding of applications to leverage the available parallelism. Threads must synchronize and exchange data, and the overall performance is heavily in influenced by the overhead added by these mechanisms, especially as developers try to exploit finer grain parallelism to be able to use all available resources.