Modeling technology impact on cluster microprocessor performance

Authors:
Lucian Codrescu;Steve Nugent;James Meindl;D. Scott Wills
Affiliations:
Motorola, Austin, TX;Microelectronics Research Center, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA;Microelectronics Research Center, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA;Microelectronics Research Center, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Year:
2003

Citing 19
Cited 3

A study of single-chip processor/cache organizations for large numbers of transistors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
Performance comparison of ILP machines with cycle time evaluation

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Baring It All to Software: Raw Machines

Computer
Exploring Microprocessor Architectures for Gigascale Integration

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
On Dynamic Speculative Thread Partitioning and the MEM-Slicing Algorithm

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Performance and Area Analysis of Processor Configurations with Scaling of Technology

Performance and Area Analysis of Processor Configurations with Scaling of Technology
A dynamic multithreading processor

A dynamic multithreading processor
A generic system simulator with novel on-chip cache and throughput models for gigascale integration

A generic system simulator with novel on-chip cache and throughput models for gigascale integration

The impact of grain size on the efficiency of embedded SIMD image processing architectures

Journal of Parallel and Distributed Computing
Technology-based Architectural Analysis of Operand Bypass Networks for Efficient Operand Transport

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
A framework introducing model reversibility in SoC design space exploration

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation

Quantified Score

Hi-index	0.01

Visualization

Abstract

The growing speed gap between transistors and wire interconnects is forcing the development of distributed, or clustered, architectures. These designs partition the chip into small regions with fast intracluster communication. Longer latency is required to communicate between clusters. The hardware and/or software are responsible for scheduling instructions to clusters such that critical path communication occurs within a cluster. This paper presents GENEric SYstems Simulator (GENESYS), a technology modeling tool that captures a broad range of materials, device, circuit, and interconnect parameters across current and future semiconductor technology. This tool is used to explore the relationship between key technology parameters (intercluster wire delay and transistor switching delay) and key architecture parameters (super scalar versus multithreaded instruction dispatch, and value prediction support). GENESYS is used to predict intercluster latencies as VLSI technology advances. The study provides quantitative data showing how conventional superscalar performance is degraded with increasing wire latency. Threaded designs are more tolerant to wire delay. Optimal thread size changes with advancing VLSI technology, suggesting a highly adaptive architecture. Value prediction is shown to be useful in all cases, but provides more benefit to the multithreaded design.