Simultaneous Multithreaded Vector Architecture: Merging ILP and DLP for High Performance

  • Authors:
  • Roger Espasa;Mateo Valero

  • Affiliations:
  • -;-

  • Venue:
  • HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of this paper is to show that instructionlevel parallelism (ILP) and data-level parallelism(DLP) can be merged in a single simultaneous vectormultithreaded architecture to execute regular vectorizablecode at a performance level that can not be achieved using either paradigm on its own.We willshow that the combination of the two techniques yieldsvery high performance at a low cost and alow complexity:We will show that this architecture achievesa sustained performance on numerical regular codesthat is 20 times the performance that can be achievedwith today's superscalar microprocessors.Moreover,we will show that the architecture can tolerate verylarge memory latencies, of up to a 100 cycles, witha relatively small performance degradation.This highperformance is independent of working set size or oflocality considerations, since the DLP paradigm allowsvery efficient exploitation of a high performance flatmemory bandwidth.