Memory access scheduling

Authors:
Scott Rixner;William J. Dally;Ujval J. Kapasi;Peter Mattson;John D. Owens
Affiliations:
Electrical Engineering, Massachusetts Institute of Technology and Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA
Venue:
Proceedings of the 27th annual international symposium on Computer architecture
Year:
2000

Citing 14
Cited 173

Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A performance comparison of contemporary DRAM architectures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Guest Editors' Introduction: Media Processing: A New Design Target

IEEE Micro
A Case for Intelligent RAM

IEEE Micro
Direct Rambus Technology: The New Main Memory Standard

IEEE Micro
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A Characterization of Processor Performance in the vax-11/780

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Access ordering and memory-conscious cache utilization

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Command Vector Memory Systems: High Performance at Low Cost

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques

Polygon rendering on a stream architecture

HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Access pattern based local memory customization for low power embedded systems

Proceedings of the conference on Design, automation and test in Europe
New directions in compiler technology for embedded systems (embedded tutorial)

Proceedings of the 2001 Asia and South Pacific Design Automation Conference
A study of memory system performance of multimedia applications

Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Designing a Modern Memory Hierarchy with Hardware Prefetching

IEEE Transactions on Computers
Comparing Reyes and OpenGL on a stream architecture

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
System and architecture-level power reduction of microprocessor-based communication and multi-media applications

Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Imagine: Media Processing with Streams

IEEE Micro
Analysis of performance bottlenecks in multithreaded multiprocessor systems

Fundamenta Informaticae - Application of concurrency to system design
A 64Mbit Mesochronous Hybrid Wave Pipelined Multibank DRAM Macro

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Aggressive Memory-Aware Compilation

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
A network flow approach to memory bandwidth utilization in embedded DSP core processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Xtream-Fit: an energy-delay efficient data memory subsystem for embedded media processing

Proceedings of the 40th annual Design Automation Conference
Exploring the VLSI Scalability of Stream Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Efficient use of memory bandwidth to improve network processor throughput

Proceedings of the 30th annual international symposium on Computer architecture
An Analysis of Cache Performance of Multimedia Applications

IEEE Transactions on Computers
Adaptive History-Based Memory Schedulers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory Controller Optimizations for Web Servers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Micro-architecture techniques in the intel® E8870 scalable memory controller

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A study of performance impact of memory controller features in multi-processor server environment

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Traffic shaping for an FPGA based SDRAM controller with complex QoS requirements

Proceedings of the 42nd annual Design Automation Conference
Adaptive History-Based Memory Schedulers for Modern Processors

IEEE Micro
Database hash-join algorithms on multithreaded computer architectures

Proceedings of the 3rd conference on Computing frontiers
Exploiting locality to ameliorate packet queue contention and serialization

Proceedings of the 3rd conference on Computing frontiers
The bit-reversal SDRAM address mapping

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
The design space of data-parallel memory systems

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Virtually Pipelined Network Memory

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS

ACM SIGARCH Computer Architecture News
A 64-bit stream processor architecture for scientific applications

Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A high-end real-time digital film processing reconfigurable platform

EURASIP Journal on Embedded Systems
Predator: a predictable SDRAM memory controller

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Memory scheduling for modern microprocessors

ACM Transactions on Computer Systems (TOCS)
Improving SDRAM access energy efficiency for low-power embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Memory performance attacks: denial of memory service in multi-core systems

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Exploiting program cyclic behavior to reduce memory latency in embedded processors

Proceedings of the 2008 ACM symposium on Applied computing
A small data cache for multimedia-oriented embedded systems

Journal of Systems Architecture: the EUROMICRO Journal
Optimizing thread throughput for multithreaded workloads on memory constrained CMPs

Proceedings of the 5th conference on Computing frontiers
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Designing packet buffers for router linecards

IEEE/ACM Transactions on Networking (TON)
Distributed order scheduling and its application to multi-core dram controllers

Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
Memory organization with multi-pattern parallel accesses

Proceedings of the conference on Design, automation and test in Europe
Power Aware External Bus Arbitration for System-on-a-Chip Embedded Systems

Transactions on High-Performance Embedded Architectures and Compilers I
Efficient Memory Utilization for High-Speed FPGA-Based Hardware Emulators with SDRAMs

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A light-weight fairness mechanism for chip multiprocessor memory systems

Proceedings of the 6th ACM conference on Computing frontiers
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware

ACM Transactions on Architecture and Code Optimization (TACO)
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices

Proceedings of the 36th annual international symposium on Computer architecture
Buffer management for colored packets with deadlines

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
High-Performance Buffer Mapping to Exploit DRAM Concurrency in Multiprocessor DSP Systems

RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Application development with the FlexWAFE real-time stream processing architecture for FPGAs

ACM Transactions on Embedded Computing Systems (TECS)
Prediction in Dynamic SDRAM Controller Policies

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Exploiting Locality on the Cell/B.E. through Bypassing

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
High-bandwidth network memory system through virtual pipelines

IEEE/ACM Transactions on Networking (TON)
An SDRAM-aware router for Networks-on-Chip

Proceedings of the 46th Annual Design Automation Conference
Future scaling of processor-memory interfaces

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Complexity effective memory access scheduling for many-core accelerator architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Coordinated control of multiple prefetchers in multi-core systems

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Improving memory bank-level parallelism in the presence of prefetching

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Implementing and optimizing a data-intensive hydrodynamics application on the stream processor

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
The virtual write queue: coordinating DRAM and last-level cache policies

Proceedings of the 37th annual international symposium on Computer architecture
A Low-Latency and Memory-Efficient On-chip Network

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A Network Congestion-Aware Memory Controller

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
WAYPOINT: scaling coherence to thousand-core architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An SDRAM-aware router for networks-on-chip

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special section on the ACM IEEE international conference on formal methods and models for codesign (MEMOCODE) 2009
Software-hardware cooperative DRAM bank partitioning for chip multiprocessors

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A platform for high level synthesis of memory-intensive image processing algorithms

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Exploring the prefetcher/memory controller design space: an opportunistic prefetch scheduling strategy

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E

International Journal of High Performance Computing Applications
FPGA implementation & comparison of current trends in memory scheduler for multimedia application

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Application specific memory access, reuse and reordering for SDRAM

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Scalable QoS-aware memory controller for high-bandwidth packet memory

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Performance limitations of block-multithreaded distributed-memory systems

Winter Simulation Conference
Page placement in hybrid memory systems

Proceedings of the international conference on Supercomputing
Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture

Proceedings of the international conference on Supercomputing
Prefetch-aware shared resource management for multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput

Proceedings of the 38th annual international symposium on Computer architecture
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts

Proceedings of the 8th ACM International Conference on Computing Frontiers
T&I engine: traversal and intersection engine for hardware accelerated ray tracing

Proceedings of the 2011 SIGGRAPH Asia Conference
Power management of hybrid DRAM/PRAM-based main memory

Proceedings of the 48th Design Automation Conference
Matching cache access behavior and bit error pattern for high performance low Vcc L1 cache

Proceedings of the 48th Design Automation Conference
Bandwidth constrained coordinated HW/SW prefetching for multicores

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Memory controllers for high-performance and real-time MPSoCs: requirements, architectures, and future trends

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A bursty multi-port memory controller with quality-of-service guarantees

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploring the limits of GPGPU scheduling in control flow bound applications

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Utilizing RF-I and intelligent scheduling for better throughput/watt in a mobile GPU memory system

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Memory access schedule minimization for embedded systems

Journal of Systems Architecture: the EUROMICRO Journal
Power aware external bus arbitration for system-on-a-chip embedded systems

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Improving System Energy Efficiency with Memory Rank Subsetting

ACM Transactions on Architecture and Code Optimization (TACO)
Hierarchical memory scheduling for multimedia MPSoCs

Proceedings of the International Conference on Computer-Aided Design
A fair thread-aware memory scheduling algorithm for chip multiprocessor

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Optimizing SDRAM bandwidth for custom FPGA loop accelerators

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A resistive TCAM accelerator for data-intensive computing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A register-file approach for row buffer caches in die-stacked DRAMs

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Parallel application memory scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems

ACM Transactions on Computer Systems (TOCS)
A high performance adaptive miss handling architecture for chip multiprocessors

Transactions on High-Performance Embedded Architectures and Compilers IV
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
DIEF: an accurate interference feedback mechanism for chip multiprocessor memory systems

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Deterministic execution model on COTS hardware

ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC

Proceedings of the 49th Annual Design Automation Conference
Heterogeneous multi-channel: fine-grained DRAM control for both system performance and power efficiency

Proceedings of the 49th Annual Design Automation Conference
Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU

Proceedings of the 49th Annual Design Automation Conference
Write performance improvement by hiding R drift latency in phase-change RAM

Proceedings of the 49th Annual Design Automation Conference
Rank idle time prediction driven last-level cache writeback

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Trace-driven simulation of memory system scheduling in multithread application

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities

Proceedings of the 26th ACM international conference on Supercomputing
Unified memory optimizing architecture: memory subsystem control with a unified predictor

Proceedings of the 26th ACM international conference on Supercomputing
DRAM power-aware rank scheduling

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
RAIDR: Retention-Aware Intelligent DRAM Refresh

Proceedings of the 39th Annual International Symposium on Computer Architecture
PARDIS: a programmable memory controller for the DDRx interfacing standards

Proceedings of the 39th Annual International Symposium on Computer Architecture
BOOM: enabling mobile memory based low-power server DIMMs

Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving writeback efficiency with decoupled last-write prediction

Proceedings of the 39th Annual International Symposium on Computer Architecture
A case for exploiting subarray-level parallelism (SALP) in DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture
Physically addressed queueing (PAQ): improving parallelism in solid state disks

Proceedings of the 39th Annual International Symposium on Computer Architecture
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
The dynamic granularity memory system

Proceedings of the 39th Annual International Symposium on Computer Architecture
Optimizing datacenter power with memory system levers for guaranteed quality-of-service

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A software memory partition approach for eliminating bank-level interference in multicore systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Analysis of Performance Bottlenecks in Multithreaded Multiprocessor Systems

Fundamenta Informaticae - Application of Concurrency to System Design
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
A distributed interleaving scheme for efficient access to WideIO DRAM memory

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Analytical synthesis of bandwidth-efficient SDRAM address generators

Microprocessors & Microsystems
Timing effects of DDR memory systems in hard real-time multicore architectures: Issues and solutions

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
High-performance and low-energy buffer mapping method for multiprocessor DSP systems

ACM Transactions on Embedded Computing Systems (TECS)
Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Predicting Performance Impact of DVFS for Realistic Memory Systems

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Addressing End-to-End Memory Access Latency in NoC-Based Multicores

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Conservative row activation to improve memory power efficiency

Proceedings of the 27th international ACM conference on International conference on supercomputing
CMP off-chip bandwidth scheduling guided by instruction criticality

Proceedings of the 27th international ACM conference on International conference on supercomputing
Prefetching and cache management using task lifetimes

Proceedings of the 27th international ACM conference on International conference on supercomputing
Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
Improving memory scheduling via processor-side load criticality information

Proceedings of the 40th Annual International Symposium on Computer Architecture
AC-DIMM: associative computing with STT-MRAM

Proceedings of the 40th Annual International Symposium on Computer Architecture
Orchestrated scheduling and prefetching for GPGPUs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Reducing memory access latency with asymmetric DRAM bank organizations

Proceedings of the 40th Annual International Symposium on Computer Architecture
A network congestion-aware memory subsystem for manycore

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Pragmatic integration of an SRAM row cache in heterogeneous 3-D DRAM architecture using TSV

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SGRT: a mobile GPU architecture for real-time ray tracing

Proceedings of the 5th High-Performance Graphics Conference
Distributed fair DRAM scheduling in network-on-chips architecture

Journal of Systems Architecture: the EUROMICRO Journal
Designing on-chip networks for throughput accelerators

ACM Transactions on Architecture and Code Optimization (TACO)
Effect of page frame allocation pattern on bank conflicts in multi-core systems

Proceedings of the 2013 Research in Adaptive and Convergent Systems
Memory-centric system interconnect design with hybrid memory cubes

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Reshaping cache misses to improve row-buffer locality in multicore systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A programmable memory controller for the DDRx interfacing standards

ACM Transactions on Computer Systems (TOCS)
E3CC: A memory error protection scheme with novel address mapping for subranked and low-power memories

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing DRAM row activations with eager read/write clustering

ACM Transactions on Architecture and Code Optimization (TACO)
Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os

Proceedings of the International Conference on Computer-Aided Design
Direct distributed memory access for CMPs

Journal of Parallel and Distributed Computing
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications

Proceedings of Workshop on General Purpose Processing Using GPUs
Endurance-aware cache line management for non-volatile caches

ACM Transactions on Architecture and Code Optimization (TACO)
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the “3-D” structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference in bandwidth between successive references to different columns within a row and different rows within a bank. This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Conservative reordering, in which the first ready reference in a sequence is performed, improves bandwidth by 40% for traces from five media benchmarks. Aggressive reordering, in which operations are scheduled to optimize memory bandwidth, improves bandwidth by 93% for the same set of applications. Memory access scheduling is particularly important for media processors where it enables the processor to make the most efficient use of scarce memory bandwidth.