Scalability evaluation of a polymorphic register file: A CG case study

  • Authors:
  • Cătălin B. Ciobanu;Xavier Martorell;Georgi K. Kuzmanov;Alex Ramirez;Georgi N. Gaydadjiev

  • Affiliations:
  • Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology, The Netherlands;Universitat Politècnica de Catalunya and Barcelona Supercomputing Center, Spain;Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology, The Netherlands;Universitat Politècnica de Catalunya and Barcelona Supercomputing Center, Spain;Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology, The Netherlands

  • Venue:
  • ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We evaluate the scalability of a Polymorphic Register File using the Conjugate Gradient method as a case study. We focus on a heterogeneous multi-processor architecture, taking into consideration critical parameters such as cache bandwidth and memory latency. We compare the performance of 256 Polymorphic Register File-augmented workers against a single Cell PowerPC Processor Unit (PPU). In such a scenario, simulation results suggest that for the Sparse Matrix Vector Multiplication kernel, absolute speedups of up to 200 times can be obtained. Moreover, when equal number of workers in the range 1-256 is employed, our design is between 1.7 and 4.2 times faster than a Cell PPU-based system. Furthermore, we study the memory latency and cache bandwidth impact on the sustainable speedups of the system considered. Our tests suggest that a 128 worker configuration requires the caches to deliver 1638.4 GB/sec in order to preserve 80% of its peak speedup.