Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance optimization of pipelined primary cache
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Tradeoffs in processor/memory interfaces for superscalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Cache write policies and performance
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
Tradeoffs in two-level on-chip caching
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A study of single-chip processor/cache organizations for large numbers of transistors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A unified architectural tradeoff methodology
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Resource allocation in a high clock rate microprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Circuit implementation of a 300-MHz 64-bit second-generation CMOS Alpha CPU
Digital Technical Journal - Special 10th anniversary issue
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
The memory wall and the CMOS end-point
ACM SIGARCH Computer Architecture News
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Benchmark Handbook: For Database and Transaction Processing Systems
Benchmark Handbook: For Database and Transaction Processing Systems
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Advanced performance features of the 64-bit PA-8000
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Performance Factors for Superscalar Processors
Performance Factors for Superscalar Processors
High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors
High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors
Exploiting dead value information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Decoupling local variable accesses in a wide-issue superscalar processor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
IEEE Transactions on Computers
Hardware spatial forwarding for widely shared data
Proceedings of the 14th international conference on Supercomputing
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors
IEEE Transactions on Computers
High Bandwidth On-Chip Cache Design
IEEE Transactions on Computers
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Let caches decay: reducing leakage energy via exploitation of cache generational behavior
ACM Transactions on Computer Systems (TOCS)
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
On cache memory hierarchy for Chip-Multiprocessor
ACM SIGARCH Computer Architecture News
Design and Optimization of Large Size and Low Overhead Off-Chip Caches
IEEE Transactions on Computers
A case for shared instruction cache on chip multiprocessors running OLTP
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Scalable cache memory design for large-scale SMT architectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Hierarchical memory system design for a heterogeneous multi-core processor
Proceedings of the 2008 ACM symposium on Applied computing
Hi-index | 0.01 |
In this paper we evaluate the performance of high bandwidth caches that employ multiple ports, multiple cycle hit times, on-chip DRAM, and a line buffer to find the organization that provides the best processor performance. Processor performance is measured in execution time using a dynamic superscalar processor running realistic benchmarks that include operating system references. The results show that a large dual-ported multi-cycle pipelined SRAM cache with a line buffer maximizes processor performance. A large pipelined cache provides both a low miss rate and a high CPU clock frequency. Dual-porting the cache and the use of a line buffer provide the bandwidth needed by a dynamic superscalar processor. In addition, the line buffer makes the pipelined dual-ported cache the best option by increasing cache port bandwidth and hiding cache latency.