Exploring Microprocessor Architectures for Gigascale Integration

Authors:
Lucian Codrescu;Mondira Deb-Pant;Tarek Taha;John Eble;Scott Wills;James Meindl
Affiliations:
-;-;-;-;-;-
Venue:
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Year:
1999

Citing 18
Cited 4

Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A study of single-chip processor/cache organizations for large numbers of transistors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance comparison of ILP machines with cycle time evaluation

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor

Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor

Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications

IEEE Transactions on Computers
Multiple-path execution for chip multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Modeling technology impact on cluster microprocessor performance

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
A Multi-objective and Hierarchical Exploration Tool for SoC Performance Estimation

SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

As VLSI advances towards billions of fast transistors on a chip (Gigascale Integration, or GSI), it is becoming clear that interconnect issues will dominate. Conventional uniprocessor architectures, developed in an era when interconnect was largely ignored, may be incompatible with this technology. This paper presents a quantitative exploration of architectural alternatives for gigascale technology. It evaluates a set of candidate architectures in 100 nm technology that span a spectrum of uniprocessor and multiprocessor configurations. Results show that a system composed of a small number of moderately complex processors provides the best performance over a wide range of applications. Designs that include large complex uniprocessors are limited by wire delay, and fall short of parallel systems when even a small amount of explicit parallelism is available (greater than 10% of the workload). Similarly, highly parallel designs with many small processors are restricted in sequential environments with limited parallelism. The only designs capable of maintaining Moore's law require extremely parallel workloads