DLP +TLP Processors for the Next Generation of Media Workloads

  • Authors:
  • Jesus Corbal;Roger Espasa;Mateo Valero

  • Affiliations:
  • -;-;-

  • Venue:
  • HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: Future media workloads will require about two levels of magnitude the performance achieved by current general purpose processors.High uni-threaded performance will be needed to accomplish real-time constraints together with huge computational throughput, as next generation of media workloads will be eminently multithreaded (MPEG- 4/MPEG-7). In order to fulfill the challenge of providing both good uni-threaded performance and throughput, we propose to join the simultaneous multithreading execution paradigm (SMT) together with the ability to execute media- oriented streaming µ-SIMD instructions.This paper evaluates the performance of two different aggressive SMT processors: one with conventional µ-SIMD extensions (such as MMX)and one with longer streaming vector µ-SIMD extensions. We will show that future media workloads are, in fact, dominated by the scalar performance. The combination of SMT plus streaming vector µ-SIMD helps alleviate the performance bottleneck of the integer unit. SMT allows "hiding" vector execution underneath integer execution by overlapping the two types of computation, while the streaming vector µ-SIMD reduces the pressure on issue width and fetch bandwidth, and provides a powerfu mechanism to tolerate latency that allows to implement smart decoupled cache hierarchies.