Exploiting shared scratch pad memory space in embedded multiprocessor systems

Authors:
Mahmut Kandemir;J. Ramanujam;A. Choudhary
Affiliations:
Pennsylvania State University, University Park, PA;Louisiana State University, Baton Rouge, LA;Northwestern University, Evanston, IL
Venue:
Proceedings of the 39th annual Design Automation Conference
Year:
2002

Citing 16
Cited 24

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Minimization of memory traffic in high-level synthesis

DAC '94 Proceedings of the 31st annual Design Automation Conference
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
Architectural exploration and optimization of local memory in embedded systems

ISSS '97 Proceedings of the 10th international symposium on System synthesis
Memory exploration for low power, embedded systems

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Influence of compiler optimizations on system power

Proceedings of the 37th Annual Design Automation Conference
Energy-driven integrated hardware-software optimizations using SimplePower

Proceedings of the 27th annual international symposium on Computer architecture
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Design of High-Performance Microprocessor Circuits

Design of High-Performance Microprocessor Circuits
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
DSP Processors Hit the Mainstream

Computer
Increasing Energy Efficiency of Embedded Systems by Application-Specific Memory Hierarchy Generation

IEEE Design & Test
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test

Polynomial-time algorithm for on-chip scratchpad memory partitioning

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Data compression for improving SPM behavior

Proceedings of the 41st annual Design Automation Conference
A post-compiler approach to scratchpad mapping of code

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs

ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
Energy aware memory architecture configuration

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Power reduction techniques for microprocessor systems

ACM Computing Surveys (CSUR)
Banked scratch-pad memory management for reducing leakage energy consumption

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
SPM Conscious Loop Scheduling for Embedded Chip Multiprocessors

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Integrated scratchpad memory optimization and task scheduling for MPSoC architectures

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Exploration of distributed shared memory architectures for NoC-based multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors

Proceedings of the 44th annual Design Automation Conference
A Framework for Task Scheduling and Memory Partitioning for Multi-Processor System-on-Chip

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory

Journal of Signal Processing Systems
Scratchpad allocation for concurrent embedded software

ACM Transactions on Programming Languages and Systems (TOPLAS)
Heap data management for limited local memory (LLM) multi-core processors

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Algorithms for optimally arranging multicore memory structures

EURASIP Journal on Embedded Systems
Compiler-guided leakage optimization for banked scratch-pad memories

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Static bus schedule aware scratchpad allocation in multiprocessors

Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Reducing memory space consumption through dataflow analysis

Computer Languages, Systems and Structures
FCC-SDP: a fast close-coupled shared data pool for multi-core DSPs

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Write activity reduction on non-volatile main memories for embedded chip multiprocessors

ACM Transactions on Embedded Computing Systems (TECS)
Automatic and efficient heap data management for limited local memory multicore architectures

Proceedings of the Conference on Design, Automation and Test in Europe
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors

ACM Transactions on Embedded Computing Systems (TECS)
Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a compiler strategy to optimize data accesses in regular array-intensive applications running on embedded multiprocessor environments. Specifically, we propose an optimization algorithm that targets the reduction of extra off-chip memory accesses caused by inter-processor communication. This is achieved by increasing the application-wide reuse of data that resides in the scratch-pad memories of processors. Our experimental results obtained on four array-intensive image processing applications indicate that exploiting inter-processor data sharing can reduce the energy-delay product by as much as 33.8% (and 24.3% on average) on a four-processor embedded system. The results also show that the proposed strategy is robust in the sense that it gives consistently good results over a wide range of several architectural parameters.