Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices
Proceedings of the 36th annual international symposium on Computer architecture
High-bandwidth network memory system through virtual pipelines
IEEE/ACM Transactions on Networking (TON)
A Low-Latency and Memory-Efficient On-chip Network
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Software-hardware cooperative DRAM bank partitioning for chip multiprocessors
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
FPGA implementation & comparison of current trends in memory scheduler for multimedia application
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Utilizing RF-I and intelligent scheduling for better throughput/watt in a mobile GPU memory system
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Memory access schedule minimization for embedded systems
Journal of Systems Architecture: the EUROMICRO Journal
Hierarchical memory scheduling for multimedia MPSoCs
Proceedings of the International Conference on Computer-Aided Design
A high performance adaptive miss handling architecture for chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
Rank idle time prediction driven last-level cache writeback
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Improving writeback efficiency with decoupled last-write prediction
Proceedings of the 39th Annual International Symposium on Computer Architecture
Conservative open-page policy for mixed time-criticality memory controllers
Proceedings of the Conference on Design, Automation and Test in Europe
Navigating big data with high-throughput, energy-efficient data partitioning
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Utilizing the nonuniform latencies of SDRAM devices, access reordering mechanisms alter the sequence of main memory access streams to reduce the observed access latency. Using a revised M5 simulator with an accurate SDRAM module, the burst scheduling access reordering mechanism is proposed and compared to conventional in order memory scheduling as well as existing academic and industrial access reordering mechanisms. With burst scheduling, memory accesses to the same rows of the same banks are clustered into bursts to maximize bus utilization of the SDRAM device. Subject to a static threshold, memory reads are allowed to preempt ongoing writes for reduced read latency, while qualified writes are piggybacked at the end of bursts to exploit row locality in writes and prevent write queue saturation. Performance improvements contributed by read preemption and write piggybacking are identi-field. Simulation results show that burst scheduling reduces the average execution time of selected SPEC CPU2000 benchmarks by 21% over conventional bank in order memory scheduling. Burst scheduling also outperforms Intel's patented out of order memory scheduling and the row hit access reordering mechanism by 11% and 6% respectively.