Modeling technology impact on cluster microprocessor performance

  • Authors:
  • Lucian Codrescu;Steve Nugent;James Meindl;D. Scott Wills

  • Affiliations:
  • Motorola, Austin, TX;Microelectronics Research Center, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA;Microelectronics Research Center, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA;Microelectronics Research Center, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

The growing speed gap between transistors and wire interconnects is forcing the development of distributed, or clustered, architectures. These designs partition the chip into small regions with fast intracluster communication. Longer latency is required to communicate between clusters. The hardware and/or software are responsible for scheduling instructions to clusters such that critical path communication occurs within a cluster. This paper presents GENEric SYstems Simulator (GENESYS), a technology modeling tool that captures a broad range of materials, device, circuit, and interconnect parameters across current and future semiconductor technology. This tool is used to explore the relationship between key technology parameters (intercluster wire delay and transistor switching delay) and key architecture parameters (super scalar versus multithreaded instruction dispatch, and value prediction support). GENESYS is used to predict intercluster latencies as VLSI technology advances. The study provides quantitative data showing how conventional superscalar performance is degraded with increasing wire latency. Threaded designs are more tolerant to wire delay. Optimal thread size changes with advancing VLSI technology, suggesting a highly adaptive architecture. Value prediction is shown to be useful in all cases, but provides more benefit to the multithreaded design.