A Simulation Study of Decoupled Vector Architectures

  • Authors:
  • Roger Espasa;Mateo Valero

  • Affiliations:
  • Dept. Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, roger@ac.upc.es;Dept. Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, mateo@ac.upc.es

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Decoupling techniques can be applied to a vector processor, resulting in a large increase in performance of vectorizable programs. We simulate a selection of the Perfect Club and Specfp92 benchmark suites and compare their execution time on a conventional single port vector architecture with that of a decoupled vector architecture. Decoupling increases the performance by a factor greater than 1.4 for realistic memory latencies, and for an ideal memory system with zero latency, there is still a speedup of as much as 1.3. A significant portion of this paper is devoted to studying the tradeoffs involved in choosing a suitable size for the queues of the decoupled architecture. The hardware cost of the queues need not be large to achieve most of the performance advantages of decoupling.