Scheduling and behavioral transformation for parallel systems
Scheduling and behavioral transformation for parallel systems
Algorithmic transformation techniques for efficient exploration of alternative application instances
Proceedings of the tenth international symposium on Hardware/software codesign
Code size reduction technique and implementation for software-pipelined DSP applications
ACM Transactions on Embedded Computing Systems (TECS)
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors
Proceedings of the 44th annual Design Automation Conference
A durable and energy efficient main memory using phase change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Introduction to Algorithms, Third Edition
Introduction to Algorithms, Third Edition
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding
ACM Transactions on Embedded Computing Systems (TECS)
Write activity reduction on flash main memory via smart victim cache
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Proceedings of the 47th Design Automation Conference
Optimizing nested loops with iterational and instructional retiming
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
PCM-FTL: A Write-Activity-Aware NAND Flash Memory Management Scheme for PCM-Based Embedded Systems
RTSS '11 Proceedings of the 2011 IEEE 32nd Real-Time Systems Symposium
Rotation scheduling: a loop pipelining algorithm
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Optimally Maximizing Iteration-Level Loop Parallelism
IEEE Transactions on Parallel and Distributed Systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Non-volatile memories (NVMs) show great potential in replacing DRAM as the main memory in many embedded systems because of their attractive characteristics such as low cost, high density, and low energy consumption. However, the problem of asymmetric read and write costs has to be addressed before the advantages of NVM can be fully exploited. That is, the cost of write operation is much more expensive than the cost of read operation on NVMs. The existing techniques for loop optimization cannot be used effectively with non-volatile main memory because this special feature is not considered. In this paper, we propose an efficient loop scheduling algorithm, the Rotation with Maximum Bipartite Matching (RMBM) algorithm, to address the problem of expensive write operations on non-volatile main memory for chip multiprocessors (CMPs). It achieves high parallelism for a loop and, at the same time, reduces the number of write operations on NVM. The experimental results show that the RMBM algorithm reduces the number of write activities on NVM by 34.5 % on average compared with the traditional rotation scheduling algorithm. The execution time is reduced by 20.5 %, and the energy consumption is also reduced by 15.03 % on average using the RMBM algorithm. In other words, the average lifetime of NVM can be extended by more than 2 times using the proposed technique.