Producer-Consumer: the programming model for future many-core processors

Authors:
Arnau Prat-Pérez;David Dominguez-Sal;Josep-Lluis Larriba-Pey;Pedro Trancoso
Affiliations:
DAMA-UPC, Universitat Politècnica de Catalunya, Barcelona, Spain;DAMA-UPC, Universitat Politècnica de Catalunya, Barcelona, Spain,Sparsity Technologies, Barcelona, Spain;DAMA-UPC, Universitat Politècnica de Catalunya, Barcelona, Spain;Department of Computer Science, University of Cyprus, Nicosia, Cyprus
Venue:
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Year:
2013

Citing 16
Cited 0

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A Study of Concurrency in MPEG-4 Video Encoder

ICMCS '98 Proceedings of the IEEE International Conference on Multimedia Computing and Systems
Xbox 360 System Architecture

IEEE Micro
Data-Driven Multithreading Using Conventional Microprocessors

IEEE Transactions on Parallel and Distributed Systems
Programming using RapidMind on the Cell BE

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
X10: concurrent programming for modern architectures

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Multi-threaded game engine design

Proceedings of the 3rd Australasian conference on Interactive entertainment
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Efficient semi-streaming algorithms for local triangle counting in massive graphs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Microscopic evolution of social networks

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Patterns for parallel programming

Patterns for parallel programming
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

Languages and Compilers for Parallel Computing
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
The 48-core SCC Processor: the Programmer's View

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Multicore Desktop Programming with Intel Threading Building Blocks

IEEE Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

The massive addition of cores on a chip is adding more pressure to the accesses to main memory. In order to avoid this bottleneck, we propose the use of a simple producer-consumer model, which allows for the temporary results to be transferred directly from one task to another. These data transfer operations are performed within the chip, using on-chip memory, thus avoiding costly off-chip memory accesses. We implement this model on a real many-core processor, the 48-core Intel Single-chip Cloud Computer processor using its on-chip memory facilities. We find that the Producer-Consumer model adapts to such architectures and allow to achieve good task and data parallelism. For the evaluation of the proposed platform we implement a graph-based application using the Producer- Consumer model. Our tests show that the model scales very well as it takes advantage of the on-chip memory. The execution times of our implementation are up to 9 times faster than the baseline implementation, which relies on storing the temporary results to main memory.