Universal Mechanisms for Data-Parallel Architectures

  • Authors:
  • Karthikeyan Sankaralingam;Stephen W. Keckler;William R. Mark;Doug Burger

  • Affiliations:
  • Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin;Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin;Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin;Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin

  • Venue:
  • Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data-parallel programs are both growing in importanceand increasing in diversity, resulting in specialized processorstargeted at specific classes of these programs. This paperpresents a classification scheme for data-parallelprogram attributes, and proposes micro-architecturalmechanisms to support applications with diverse behaviorusing a single reconfigurable architecture. We focuson the following four broad kinds of data-parallel programs- DSP/multimedia, scientific, networking, andreal-time graphics workloads. While all of these programsexhibit high computational intensity, coarse-grainregular control behavior, and some regular memory accessbehavior, they show wide variance in the computationrequirements, fine grain control behavior, and the frequencyof other types of memory accesses. Based onthis study of application attributes, this paper proposesa set of general micro-architectural mechanismsthat enable a baseline architecture to be dynamically tailoredto the demands of a particular application. Thesemechanisms provide efficient execution across a spectrumof data-parallel applications and can be applied todiverse architectures ranging from vector cores to conventionalsuperscalar cores. Our results using a baselineTRIPS processor show that the configurability of the architectureto the application demands provides harmonicmean performance improvement of 5%-55% over scalableyet less flexible architectures, and performs competitivelyagainst specialized architectures.