A study of single-chip processor/cache organizations for large numbers of transistors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The performance impact of incomplete bypassing in processor pipelines
Proceedings of the 28th annual international symposium on Microarchitecture
Performance comparison of ILP machines with cycle time evaluation
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Exploring Microprocessor Architectures for Gigascale Integration
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
On Dynamic Speculative Thread Partitioning and the MEM-Slicing Algorithm
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Performance and Area Analysis of Processor Configurations with Scaling of Technology
Performance and Area Analysis of Processor Configurations with Scaling of Technology
A dynamic multithreading processor
A dynamic multithreading processor
A generic system simulator with novel on-chip cache and throughput models for gigascale integration
A generic system simulator with novel on-chip cache and throughput models for gigascale integration
The impact of grain size on the efficiency of embedded SIMD image processing architectures
Journal of Parallel and Distributed Computing
Technology-based Architectural Analysis of Operand Bypass Networks for Efficient Operand Transport
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
A framework introducing model reversibility in SoC design space exploration
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Hi-index | 0.01 |
The growing speed gap between transistors and wire interconnects is forcing the development of distributed, or clustered, architectures. These designs partition the chip into small regions with fast intracluster communication. Longer latency is required to communicate between clusters. The hardware and/or software are responsible for scheduling instructions to clusters such that critical path communication occurs within a cluster. This paper presents GENEric SYstems Simulator (GENESYS), a technology modeling tool that captures a broad range of materials, device, circuit, and interconnect parameters across current and future semiconductor technology. This tool is used to explore the relationship between key technology parameters (intercluster wire delay and transistor switching delay) and key architecture parameters (super scalar versus multithreaded instruction dispatch, and value prediction support). GENESYS is used to predict intercluster latencies as VLSI technology advances. The study provides quantitative data showing how conventional superscalar performance is degraded with increasing wire latency. Threaded designs are more tolerant to wire delay. Optimal thread size changes with advancing VLSI technology, suggesting a highly adaptive architecture. Value prediction is shown to be useful in all cases, but provides more benefit to the multithreaded design.