Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance optimization of pipelined primary cache
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Tradeoffs in processor/memory interfaces for superscalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Cache write policies and performance
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A study of single-chip processor/cache organizations for large numbers of transistors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A unified architectural tradeoff methodology
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Resource allocation in a high clock rate microprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Benchmark Handbook: For Database and Transaction Processing Systems
Benchmark Handbook: For Database and Transaction Processing Systems
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Performance Factors for Superscalar Processors
Performance Factors for Superscalar Processors
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs
ICS '97 Proceedings of the 11th international conference on Supercomputing
Data caches for superscalar processors
ICS '97 Proceedings of the 11th international conference on Supercomputing
Designing high bandwidth on-chip caches
Proceedings of the 24th annual international symposium on Computer architecture
On high-bandwidth data cache design for multi-issue processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Decoupling local variable accesses in a wide-issue superscalar processor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Access region locality for high-bandwidth processor memory system design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Using complete system simulation to characterize SPECjvm98 benchmarks
Proceedings of the 14th international conference on Supercomputing
Hardware spatial forwarding for widely shared data
Proceedings of the 14th international conference on Supercomputing
Speculative Memory Cloaking and Bypassing
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
High Bandwidth On-Chip Cache Design
IEEE Transactions on Computers
A High-Bandwidth Memory Pipeline for Wide Issue Processors
IEEE Transactions on Computers
Reducing Memory Latency via Read-after-Read Memory Dependence Prediction
IEEE Transactions on Computers
Implementing optimizations at decode time
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Power-efficient prefetching via bit-differential offset assignment on embedded processors
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Wire Delay is Not a Problem for SMT (In the Near Future)
Proceedings of the 31st annual international symposium on Computer architecture
Scalable cache memory design for large-scale SMT architectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
RECAST: Boosting Tag Line Buffer Coverage in Low-Power High-Level Caches "for Free"
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Exploiting the replication cache to improve performance for multiple-issue microprocessors
ACM SIGARCH Computer Architecture News - Special issue: MEDEA 2004 workshop
Exploiting the replication cache to improve cache read bandwidth cost effectively
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Power-efficient prefetching for embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Access region cache with register guided memory reference partitioning
Journal of Systems Architecture: the EUROMICRO Journal
Dynamic partition of memory reference instructions – a register guided approach
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Dynamic instruction cascading on GALS microprocessors
PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Parallel matrix multiplication based on dynamic SMP clusters in SoC technology
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.01 |
The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single cache port by using additional buffering in the processor, and by taking maximum advantage of a wider cache port. We evaluate these techniques using realistic applications that include the operating system. Our techniques using a single-ported cache achieve 91% of the performance of a dual-ported cache.