Increasing the number of strides for conflict-free vector access
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Access ordering and effective memory bandwidth
Access ordering and effective memory bandwidth
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Memory systems and pipelined processors
Memory systems and pipelined processors
Hardware-only stream prefetching and dynamic access ordering
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 27th annual international symposium on Computer architecture
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Imagine: Media Processing with Streams
IEEE Micro
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor
Proceedings of the 30th annual international symposium on Computer architecture
Design and implementation of the POWER5™ microprocessor
Proceedings of the 41st annual Design Automation Conference
POWER4 system microarchitecture
IBM Journal of Research and Development
The bit-reversal SDRAM address mapping
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Memory scheduling for modern microprocessors
ACM Transactions on Computer Systems (TOCS)
Memory performance attacks: denial of memory service in multi-core systems
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Optimizing thread throughput for multithreaded workloads on memory constrained CMPs
Proceedings of the 5th conference on Computing frontiers
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Prefetch-Aware DRAM Controllers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices
Proceedings of the 36th annual international symposium on Computer architecture
Prediction in Dynamic SDRAM Controller Policies
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Software-hardware cooperative DRAM bank partitioning for chip multiprocessors
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
FPGA implementation & comparison of current trends in memory scheduler for multimedia application
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
A bursty multi-port memory controller with quality-of-service guarantees
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Memory access schedule minimization for embedded systems
Journal of Systems Architecture: the EUROMICRO Journal
Hierarchical memory scheduling for multimedia MPSoCs
Proceedings of the International Conference on Computer-Aided Design
Parallel application memory scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Rank idle time prediction driven last-level cache writeback
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
DRAM power-aware rank scheduling
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
Improving memory scheduling via processor-side load criticality information
Proceedings of the 40th Annual International Symposium on Computer Architecture
A network congestion-aware memory subsystem for manycore
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Reducing DRAM row activations with eager read/write clustering
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
As memory performance becomes increasingly important to overall system performance, the need to carefully schedule memory operations also increases. This paper presents a new approach to memory scheduling that considers the history of recently scheduled operations. This history-based approach provides two conceptual advantages: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, and (2) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller. We evaluate our solution using a cycle-accurate simulator for the recently announced IBM Power5. When compared with an in-order scheduler, our solution achieves IPC improvements of 10.9% on the NAS benchmarks and 63% on the data-intensive Stream benchmarks. Using microbenchmarks, we illustrate the growing importance of memory scheduling in the context of CMP's, hardware controlled prefetching, and faster CPU speeds.