Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Characteristics of performance-optimal multi-level cache hierarchies
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Proceedings of the 6th international workshop on Hardware/software codesign
Principles of Optimal Page Replacement
Journal of the ACM (JACM)
The Complexity of Some Problems on Subsequences and Supersequences
Journal of the ACM (JACM)
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
International Journal of Parallel Programming
DSP processor/compiler co-design: a quantitative approach
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Processor-memory coexploration using an architecture description language
ACM Transactions on Embedded Computing Systems (TECS)
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
A case for multi-level main memory
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Memory-manager/scheduler co-design: optimizing event-driven servers to improve cache behavior
Proceedings of the 5th international symposium on Memory management
Multi-Level On-Chip Memory Hierarchy Design for Embedded Chip Multiprocessors
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Integrated scratchpad memory optimization and task scheduling for MPSoC architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors
Proceedings of the 44th annual Design Automation Conference
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A Phase-Adaptive Approach to Increasing Cache Performance
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A compiler approach to managing storage and memory bandwidth in configurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hiding cache miss penalty using priority-based execution for embedded processors
Proceedings of the conference on Design, automation and test in Europe
Communications of the ACM - Security in the Browser
Hybrid cache architecture with disparate memory technologies
Proceedings of the 36th annual international symposium on Computer architecture
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Integration of Code Scheduling, Memory Allocation, and Array Binding for Memory-Access Optimization
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
In multi-core Digital Signal Processing (DSP) Systems, the processor-memory gap remains the primary obstacle in improving system performance. This paper addresses this bottleneck by combining task scheduling and memory accesses so that the system architecture and memory modules of a multi-core DSP can be utilized as efficiently as possible. To improve the system and memory utilization, the key is to take advantage of locality as much as possible and integrate it into task scheduling. Two algorithms are proposed to optimize memory accesses while scheduling tasks with timing and resource constraints. The first one uses Integer Linear Programming (ILP) to produce a schedule with the most efficient memory access sequence while satisfying the constraints. The second one is a heuristic algorithm which can produce a near optimal schedule with polynomial running time. The experimental results show that the memory access cost can be reduced up to 60% while the schedule length is also shortened.