Memory access schedule minimization for embedded systems

Authors:
Jingtong Hu;Chun Jason Xue;Wei-Che Tseng;Qingfeng Zhuge;Yingchao Zhao;Edwin H. -M. Sha
Affiliations:
Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA;College of Computer Science, Chongqing University, Chongqing, China;Department of Computer Science, Caritas Institute of Higher Education, New Territories, Hong Kong;Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA and College of Computer Science, Chongqing University, Chongqing, China
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2012

Citing 18
Cited 0

Are multiport memories physically feasible?

ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Memory estimation for high level synthesis

DAC '94 Proceedings of the 31st annual Design Automation Conference
Memory size estimation for multimedia applications

Proceedings of the 6th international workshop on Hardware/software codesign
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
High-Performance DRAMs in Workstation Environments

IEEE Transactions on Computers
Modern dram architectures

Modern dram architectures
Adaptive History-Based Memory Schedulers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory Controller Optimizations for Web Servers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Memory Optimization for Embedded Systems Running H.264/AVC Video Encoder

ICPPW '07 Proceedings of the 2007 International Conference on Parallel Processing Workshops
A Burst Scheduling Access Reordering Mechanism

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Core-aware memory access scheduling schemes

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growing gap between microprocessor speed and DRAM speed is a major problem that computer designers are facing. In order to narrow the gap, it is necessary to improve DRAM's speed and throughput. To achieve this goal, this paper proposes techniques to take advantage of the characteristics of the 3-stage access of contemporary DRAM chips by grouping the accesses of the same row together and interleaving the execution of memory accesses from different banks. A family of Bubble Filling Scheduling (BFS) algorithms are proposed in this paper to minimize memory access schedule length and improve memory access time for embedded systems. When the memory access trace is known in some application-specific embedded systems, this information can be fully utilized to generate efficient memory access schedules. The offline BFS algorithm can generate schedules which are 47.49% shorter than in-order scheduling and 8.51% shorter than existing burst scheduling on average. When memory accesses are received by the single memory controller in real time, the memory accesses have to be scheduled as they come. The online BFS algorithm in this paper serves this purpose and generates schedules which are 58.47% shorter than in-order scheduling and 4.73% shorter than burst scheduling on average. To improve the memory throughput and further reduce the memory access schedule, an architecture with dual memory controllers is proposed. According to the experimental results, the dual controller algorithm can generate schedules which are 62.89% shorter than in-order scheduling, 14.23% shorter than burst scheduling, and 10.07% shorter than single controller BFS algorithms on average.