A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
IEEE Transactions on Computers
Dynamic base register caching: a technique for reducing address bus width
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Decoupled sectored caches: conciliating low tag implementation cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A limit study of local memory requirements using value reuse profiles
Proceedings of the 28th annual international symposium on Microarchitecture
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
Creating a wider bus using caching techniques
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Optimal replacements in caches with two miss costs
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
ACM Computing Surveys (CSUR)
High-performance extendable instruction set computing
ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
ACM Transactions on Embedded Computing Systems (TECS)
Accelerating the Kernels of BLAST with an Efficient PIM (Processor-In-Memory) Architecture
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
A study of performance impact of memory controller features in multi-processor server environment
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Distance-aware L2 cache organizations for scalable multiprocessor systems
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable embedded systems: Synthesis, design and application
Processor Description Languages
Processor Description Languages
Exploration of 3D stacked L2 cache design for high performance and efficient thermal control
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
In this article, we discuss how the effects of long memory latencies and increased memory bandwidth requirements may affect the design of modern microprocessors and their memory systems. In particular, we examine the subtle trade-offs between memory latency and bandwidth. Through execution-driven simulation, we measure the fraction of time that several SPEC95 benchmarks spend computing, stalled for memory latency, and stalled for limited memory bandwidth. Our results show that as processors implement more aggressive latency tolerance techniques, limited memory bandwidth negatively impacts programs much more than do long memory latencies. Finally, we survey a range of strategies for mitigating bandwidth limitations and discuss the relative merits and disadvantages of each.