On randomly interleaved memories
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The Chinese remainder theorem and the prime memory system
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Access ordering and effective memory bandwidth
Access ordering and effective memory bandwidth
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Vector multiprocessors with arbitrated memory access
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Maximizing memory bandwidth for streamed computations
Maximizing memory bandwidth for streamed computations
Algorithmic foundations for a parallel vector access memory system
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Chip Multithreading: Opportunities and Challenges
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A study of performance impact of memory controller features in multi-processor server environment
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Modern dram memory systems: performance analysis and scheduling algorithm
Modern dram memory systems: performance analysis and scheduling algorithm
DRAMsim: a memory system simulator
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Memory scheduling for modern microprocessors
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.00 |
Using multi-channel memory subsystems is an efficient way of satisfying high volume memory requests from CMPs. At the same time, the imbalance between memory bandwidth and bus performance opens up new possibility of optimization before they are sent to bus. This paper presents a new memory controller design for embedded CMPs systems when the return data from the return buffer is sent back to bus. Our scheduling policy, called return data interleaving (RDI) interleaves the return data of each request in a round robin manner. Further, for each request, it sends the critical word first. To evaluate our technique, we model an Intel XScale-based CMPs using M5 simulator for CMPs simulation and DRAMsim for memory subsystem simulation and examine the performance of MiBench and SPEC 2000 benchmarks. Simulation results show that for memory-bound benchmarks running on the CMPs systems with the number of cores from 6 to 16, RDI can improve the execution time by average 11% and up to 16.9%.