PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Optimizing Overall Loop Schedules Using Prefetching and Partitioning
IEEE Transactions on Parallel and Distributed Systems
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
An optimal memory allocation for application-specific multiprocessor system-on-chip
Proceedings of the 14th international symposium on Systems synthesis
Exploiting shared scratch pad memory space in embedded multiprocessor systems
Proceedings of the 39th annual Design Automation Conference
Scheduling Data-Flow Graphs via Retiming and Unfolding
IEEE Transactions on Parallel and Distributed Systems
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Task Graph Extraction for Embedded System Synthesis
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Cluster assignment of global values for clustered VLIW processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
ISVLSI '06 Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures
Customized on-chip memories for embedded chip multiprocessors
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Loop Scheduling with Complete Memory Latency Hiding on Multi-core Architecture
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Multi-Level On-Chip Memory Hierarchy Design for Embedded Chip Multiprocessors
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Integrated scratchpad memory optimization and task scheduling for MPSoC architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Partitioning and scheduling DSP applications with maximal memory access hiding
EURASIP Journal on Applied Signal Processing
Rotation scheduling: a loop pipelining algorithm
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
ILP optimal scheduling for multi-module memory
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Algorithms for optimally arranging multicore memory structures
EURASIP Journal on Embedded Systems
Proceedings of the tenth ACM international conference on Embedded software
Hi-index | 0.00 |
One of the most critical components that determine the success of an MPSoC based architecture is its on-chip memory. Scratch Pad Memory (SPM) is increasingly being applied to substitute cache as the on-chip memory of embedded MPSoCs due to its superior chip area, power consumption and timing predictability. SPM can be organized as a Virtually Shared SPM (VS-SPM) architecture that takes advantage of both shared and private SPM. However, making effective use of the VS-SPM architecture strongly depends on two inter-dependent problems: variable partitioning and task scheduling. In this paper, we decouple these two problems and solve them in phase-ordered manner. We propose two variable partitioning heuristics based on an initial schedule: High Access Frequency First (HAFF) variable partitioning and Global View Prediction (GVP) variable partitioning. Then, we present a loop pipeline scheduling algorithm known as Rotation Scheduling with Variable Partitioning (RSVP) to improve overall throughput. Our experimental results obtained on MiBench show that the average performance improvements over IDAS (Integrated Data Assignment with Scheduling) are 23.74% for HAFF and 31.91% for GVP on four-core MPSoC. The average schedule length generated by RSVP is 25.96% shorter than that of list scheduling with optimal variable partition.