A Simulation Study of Decoupled Architecture Computers
IEEE Transactions on Computers
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1
Computer - Special issue on experimental research in computer architecture
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An analysis of MIPS and SPARC instruction set utilization on the SPEC benchmarks
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance characteristics of architectural features of the IBM RISC System/6000
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Instruction level profiling and evaluation of the IBM/6000
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Exploiting fine-grained parallelism through a combination of hardware and software techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Memory latency effects in decoupled architectures with a single data memory module
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluation of the WM architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
An analysis of the information content of address and data reference streams
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compiler-directed data prefetching in multiprocessors with memory hierarchies
ICS '90 Proceedings of the 4th international conference on Supercomputing
Two-ported cache alternatives for superscalar processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
IEEE Micro
Memory Latency Effects in Decoupled Architectures
IEEE Transactions on Computers
Information content of CPU memory referencing behavior
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Compiler-Directed Dynamic Frequency and Voltage Scheduling
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Implications of Executing Compression and Encryption Applications on General Purpose Processors
IEEE Transactions on Computers
Measuring Benchmark Similarity Using Inherent Program Characteristics
IEEE Transactions on Computers
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
Proceedings of the 34th annual international symposium on Computer architecture
OUTRIDER: efficient memory latency tolerance with decoupled strands
Proceedings of the 38th annual international symposium on Computer architecture
Hi-index | 0.01 |
Information on the behavior of programs is essential for deciding the number and nature of functional units in high performance architectures. In this paper, we present studies on the balance of access and computation tasks on a typical RISC architecture, the MIPS. The MIPS programs are analyzed to find the demands they place on the memory system and the floating point or integer computation units. A balance metric that indicates the match of accessing power to computation power is calculated. It is observed that many of the SPEC floating point programs and kernels from supercomputing applications typically considered as computation intensive programs, place extensive demands on the memory system in terms of memory bandwidth. Access related instructions are seen to dominate most instruction streams. We discuss how these instruction stream characteristics can limit the instruction issue in superscalar processors. The properties of the dynamic instruction mix are used to alert computer architects to the importance of memory bandwidth. Single instruction stream parallelism will not be much greater than two if memory bandwidth is only one. A decoupled access/execute architecture with multiple load/store units and queues which alleviate the balance problem is presented.