Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
The SUIF Compiler System: a Parallelizing and Optimizing Research Compiler
The SUIF Compiler System: a Parallelizing and Optimizing Research Compiler
Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Asynchronous DRAM Design and Synthesis
ASYNC '03 Proceedings of the 9th International Symposium on Asynchronous Circuits and Systems
Optimizing thread throughput for multithreaded workloads on memory constrained CMPs
Proceedings of the 5th conference on Computing frontiers
System level multi-bank main memory configuration for energy reduction
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
A case for exploiting subarray-level parallelism (SALP) in DRAM
Proceedings of the 39th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However, a high performance microprocessor will typically send more accesses than the DRAM can handle due to the long cycle time of the embedded DRAM, especially in applications with significant memory requirements. A multi-bank DRAM can hide the long cycle time by allowing the DRAM to process multiple accesses in parallel, but it will incur a significant area penalty and will therefore restrict the density of the embedded DRAM main memory. In this paper, we propose a hierarchical multi-bank DRAM architecture to achieve high system performance with a minimal area penalty. In this architecture, the independent memory banks are each divided into many semi-independent subbanks that share I/O and decoder resources. A hierarchical multi-bank DRAM with 4 main banks each composed of 32 subbanks occupies approximately the same area as a conventional 4 bank DRAM while performing like a 32 bank one - up to 65% better than a conventional 4 bank DRAM when integrated with a single-chip multiprocessor.