Scalability evaluation of a polymorphic register file: A CG case study

Authors:
Cătălin B. Ciobanu;Xavier Martorell;Georgi K. Kuzmanov;Alex Ramirez;Georgi N. Gaydadjiev
Affiliations:
Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology, The Netherlands;Universitat Politècnica de Catalunya and Barcelona Supercomputing Center, Spain;Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology, The Netherlands;Universitat Politècnica de Catalunya and Barcelona Supercomputing Center, Spain;Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology, The Netherlands
Venue:
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Year:
2011

Citing 11
Cited 1

Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
MOM: a matrix SIMD instruction set architecture for multimedia applications

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Implementation and Evaluation of the Complex Streamed Instruction Set

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Matrix register file and extended subwords: two techniques for embedded media processors

Proceedings of the 2nd conference on Computing frontiers
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Register pointer architecture for efficient embedded processors

Proceedings of the conference on Design, automation and test in Europe
The Burroughs Scientific Processor (BSP)

IEEE Transactions on Computers
Evaluation of memory performance on the cell BE with the SARC programming model

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Dynamically reconfigurable register file for a softcore VLIW processor

Proceedings of the Conference on Design, Automation and Test in Europe
The SARC Architecture

IEEE Micro

Separable 2d convolution with polymorphic register files

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We evaluate the scalability of a Polymorphic Register File using the Conjugate Gradient method as a case study. We focus on a heterogeneous multi-processor architecture, taking into consideration critical parameters such as cache bandwidth and memory latency. We compare the performance of 256 Polymorphic Register File-augmented workers against a single Cell PowerPC Processor Unit (PPU). In such a scenario, simulation results suggest that for the Sparse Matrix Vector Multiplication kernel, absolute speedups of up to 200 times can be obtained. Moreover, when equal number of workers in the range 1-256 is employed, our design is between 1.7 and 4.2 times faster than a Cell PPU-based system. Furthermore, we study the memory latency and cache bandwidth impact on the sustainable speedups of the system considered. Our tests suggest that a 128 worker configuration requires the caches to deliver 1638.4 GB/sec in order to preserve 80% of its peak speedup.