Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A study of single-chip processor/cache organizations for large numbers of transistors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The performance impact of incomplete bypassing in processor pipelines
Proceedings of the 28th annual international symposium on Microarchitecture
A comparison of full and partial predicated execution support for ILP processors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance comparison of ILP machines with cycle time evaluation
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications
IEEE Transactions on Computers
Multiple-path execution for chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Modeling technology impact on cluster microprocessor performance
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
A Multi-objective and Hierarchical Exploration Tool for SoC Performance Estimation
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Hi-index | 0.00 |
As VLSI advances towards billions of fast transistors on a chip (Gigascale Integration, or GSI), it is becoming clear that interconnect issues will dominate. Conventional uniprocessor architectures, developed in an era when interconnect was largely ignored, may be incompatible with this technology. This paper presents a quantitative exploration of architectural alternatives for gigascale technology. It evaluates a set of candidate architectures in 100 nm technology that span a spectrum of uniprocessor and multiprocessor configurations. Results show that a system composed of a small number of moderately complex processors provides the best performance over a wide range of applications. Designs that include large complex uniprocessors are limited by wire delay, and fall short of parallel systems when even a small amount of explicit parallelism is available (greater than 10% of the workload). Similarly, highly parallel designs with many small processors are restricted in sequential environments with limited parallelism. The only designs capable of maintaining Moore's law require extremely parallel workloads