Efficient Loop Scheduling for Chip Multiprocessors with Non-Volatile Main Memory

Authors:
Jiayi Du;Yan Wang;Qingfeng Zhuge;Jingtong Hu;Edwin H. Sha
Affiliations:
College of Information Science and Engineering, Hunan University, Changsha, China 410082;College of Information Science and Engineering, Hunan University, Changsha, China 410082;College of Computer Science, Chongqing University, Chongqing, China 40044;Department of Computer Science, University of Texas at Dallas, Richardson, USA 75080;Department of Computer Science, University of Texas at Dallas, Richardson, USA 75080 and College of Computer Science, Chongqing University, Chongqing, China 40044
Venue:
Journal of Signal Processing Systems
Year:
2013

Citing 15
Cited 1

Scheduling and behavioral transformation for parallel systems

Scheduling and behavioral transformation for parallel systems
Algorithmic transformation techniques for efficient exploration of alternative application instances

Proceedings of the tenth international symposium on Hardware/software codesign
Code size reduction technique and implementation for software-pipelined DSP applications

ACM Transactions on Embedded Computing Systems (TECS)
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors

Proceedings of the 44th annual Design Automation Conference
A durable and energy efficient main memory using phase change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding

ACM Transactions on Embedded Computing Systems (TECS)
Write activity reduction on flash main memory via smart victim cache

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Reducing write activities on non-volatile memories in embedded CMPs via data migration and recomputation

Proceedings of the 47th Design Automation Conference
Optimizing nested loops with iterational and instructional retiming

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
PCM-FTL: A Write-Activity-Aware NAND Flash Memory Management Scheme for PCM-Based Embedded Systems

RTSS '11 Proceedings of the 2011 IEEE 32nd Real-Time Systems Symposium
Rotation scheduling: a loop pipelining algorithm

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Optimally Maximizing Iteration-Level Loop Parallelism

IEEE Transactions on Parallel and Distributed Systems

Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Non-volatile memories (NVMs) show great potential in replacing DRAM as the main memory in many embedded systems because of their attractive characteristics such as low cost, high density, and low energy consumption. However, the problem of asymmetric read and write costs has to be addressed before the advantages of NVM can be fully exploited. That is, the cost of write operation is much more expensive than the cost of read operation on NVMs. The existing techniques for loop optimization cannot be used effectively with non-volatile main memory because this special feature is not considered. In this paper, we propose an efficient loop scheduling algorithm, the Rotation with Maximum Bipartite Matching (RMBM) algorithm, to address the problem of expensive write operations on non-volatile main memory for chip multiprocessors (CMPs). It achieves high parallelism for a loop and, at the same time, reduces the number of write operations on NVM. The experimental results show that the RMBM algorithm reduces the number of write activities on NVM by 34.5 % on average compared with the traditional rotation scheduling algorithm. The execution time is reduced by 20.5 %, and the energy consumption is also reduced by 15.03 % on average using the RMBM algorithm. In other words, the average lifetime of NVM can be extended by more than 2 times using the proposed technique.