Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Minimization of memory traffic in high-level synthesis
DAC '94 Proceedings of the 31st annual Design Automation Conference
Automatic optimization of communication in compiling out-of-core stencil codes
ICS '96 Proceedings of the 10th international conference on Supercomputing
Architectural exploration and optimization of local memory in embedded systems
ISSS '97 Proceedings of the 10th international symposium on System synthesis
Memory exploration for low power, embedded systems
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Influence of compiler optimizations on system power
Proceedings of the 37th Annual Design Automation Conference
Energy-driven integrated hardware-software optimizations using SimplePower
Proceedings of the 27th annual international symposium on Computer architecture
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Design of High-Performance Microprocessor Circuits
Design of High-Performance Microprocessor Circuits
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
DSP Processors Hit the Mainstream
Computer
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Polynomial-time algorithm for on-chip scratchpad memory partitioning
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Data compression for improving SPM behavior
Proceedings of the 41st annual Design Automation Conference
A post-compiler approach to scratchpad mapping of code
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs
ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
Energy aware memory architecture configuration
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Power reduction techniques for microprocessor systems
ACM Computing Surveys (CSUR)
Banked scratch-pad memory management for reducing leakage energy consumption
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
SPM Conscious Loop Scheduling for Embedded Chip Multiprocessors
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Integrated scratchpad memory optimization and task scheduling for MPSoC architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Exploration of distributed shared memory architectures for NoC-based multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors
Proceedings of the 44th annual Design Automation Conference
A Framework for Task Scheduling and Memory Partitioning for Multi-Processor System-on-Chip
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory
Journal of Signal Processing Systems
Scratchpad allocation for concurrent embedded software
ACM Transactions on Programming Languages and Systems (TOPLAS)
Heap data management for limited local memory (LLM) multi-core processors
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Algorithms for optimally arranging multicore memory structures
EURASIP Journal on Embedded Systems
Compiler-guided leakage optimization for banked scratch-pad memories
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Static bus schedule aware scratchpad allocation in multiprocessors
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Reducing memory space consumption through dataflow analysis
Computer Languages, Systems and Structures
FCC-SDP: a fast close-coupled shared data pool for multi-core DSPs
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Write activity reduction on non-volatile main memories for embedded chip multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Automatic and efficient heap data management for limited local memory multicore architectures
Proceedings of the Conference on Design, Automation and Test in Europe
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors
ACM Transactions on Embedded Computing Systems (TECS)
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
In this paper, we present a compiler strategy to optimize data accesses in regular array-intensive applications running on embedded multiprocessor environments. Specifically, we propose an optimization algorithm that targets the reduction of extra off-chip memory accesses caused by inter-processor communication. This is achieved by increasing the application-wide reuse of data that resides in the scratch-pad memories of processors. Our experimental results obtained on four array-intensive image processing applications indicate that exploiting inter-processor data sharing can reduce the energy-delay product by as much as 33.8% (and 24.3% on average) on a four-processor embedded system. The results also show that the proposed strategy is robust in the sense that it gives consistently good results over a wide range of several architectural parameters.