BatchQueue: Fast and Memory-Thrifty Core to Core Communication

Authors:
Thomas Preud'homme;Julien Sopena;Gael Thomas;Bertil Folliot
Affiliations:
-;-;-;-
Venue:
SBAC-PAD '10 Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing
Year:
2010

Citing 0
Cited 2

High-performance RMA-based broadcast on the intel SCC

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Understanding the performance of concurrent data structures on graphics processors

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequential applications can take advantage of multi-core systems by way of pipeline parallelism to improve their performance. In such parallelism, core to core communication overhead is the main limit of speedup. This paper presents BatchQueue, a fast and memory-thrifty core to core communication system based on batch processing of whole cache line. BatchQueue is able to send a 32bit word of data in just 12.5 ns on a Xeon X5472 and only needs 2 full cache lines plus 3 byte-sized variables — each on a different cache line for optimal performance — to work. The characteristics of BatchQueue — high throughput and increased latency resulting from its batch processing — makes it well suited for highly communicative tasks with no real time requirements such as monitoring.