A novel instruction scratchpad memory optimization method based on concomitance metric

Authors:
Andhi Janapsatya;Aleksandar Ignjatović;Sri Parameswaran
Affiliations:
The University of New South Wales, Sydney, Australia;The University of New South Wales, Sydney, Australia;The University of New South Wales, Sydney, Australia
Venue:
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Year:
2006

Citing 27
Cited 19

Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Procedure merging with instruction caches

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Identifying loops using DJ graphs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Nesting of reducible and irreducible loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
On loops, dominators, and dominance frontier

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Exploiting scratch-pad memory using Presburger formulas

Proceedings of the 14th international symposium on Systems synthesis
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
Reducing energy consumption by dynamic copying of instructions onto onchip memory

Proceedings of the 15th international symposium on System Synthesis
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
I-CoPES: fast instruction code placement for embedded systems to improve performance and energy efficiency

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Access pattern-based memory and connectivity architecture exploration

ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Xtream-Fit: an energy-delay efficient data memory subsystem for embedded media processing

Proceedings of the 40th annual Design Automation Conference
Optimal Code Placement of Embedded Software for Instruction Caches

EDTC '96 Proceedings of the 1996 European conference on Design and Test
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Memory Organization for Improved Data Cache Performance in Embedded Processors

ISSS '96 Proceedings of the 9th international symposium on System synthesis
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Polynomial-time algorithm for on-chip scratchpad memory partitioning

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Hardware/software managed scratchpad memory for embedded system

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design

A dynamic code placement technique for scratchpad memory using postpass optimization

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Scratchpad memory management for portable systems with a memory management unit

EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Dynamic scratchpad memory management for code in portable systems with an MMU

ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory management in a multitasking environment

EMSOFT '08 Proceedings of the 8th ACM international conference on Embedded software
Scratchpad allocation for concurrent embedded software

ACM Transactions on Programming Languages and Systems (TOPLAS)
A hardware/software framework for instruction and data scratchpad memory allocation

ACM Transactions on Architecture and Code Optimization (TACO)
Fine-grain dynamic instruction placement for L0 scratch-pad memory

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
A performance model and code overlay generator for scratchpad enhanced embedded processors

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Heap data management for limited local memory (LLM) multi-core processors

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimal WCET-aware code selection for scratchpad memory

EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
A dynamic instruction scratchpad memory for embedded processors managed by hardware

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Memory power optimization of Java-based embedded systems exploiting garbage collection information

Journal of Systems Architecture: the EUROMICRO Journal
WCET-aware data selection and allocation for scratchpad memory

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
A hard real-time capable multi-core SMT processor

ACM Transactions on Embedded Computing Systems (TECS)
Automatic and efficient heap data management for limited local memory multicore architectures

Proceedings of the Conference on Design, Automation and Test in Europe
SSDM: smart stack data management for software managed multicores (SMMs)

Proceedings of the 50th Annual Design Automation Conference
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors

ACM Transactions on Embedded Computing Systems (TECS)
Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
CMSM: an efficient and effective code management for software managed multicores

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scratchpad memory has been introduced as a replacement for cache memory as it improves the performance of certain embedded systems. Additionally, it has also been demonstrated that scratchpad memory can significantly reduce the energy consumption of the memory hierarchy of embedded systems. This is significant, as the memory hierarchy consumes a substantial proportion of the total energy of an embedded system. This paper deals with optimization of the instruction memory scratchpad based on a novel methodology that uses a metric which we call the concomitance. This metric is used to find basic blocks which are executed frequently and in close proximity in time. Once such blocks are found, they are copied into the scratchpad memory at appropriate times; this is achieved using a special instruction inserted into the code at appropriate places. For a set of benchmarks taken from Mediabench, our scratchpad system consumed just 59% (avg) of the energy of the cache system, and 73% (avg) of the energy of the state of the art scratchpad system, while improving the overall performance. Compared to the state of the art method, the number of instructions copied into the scratchpad memory from the main memory is reduced by 88%.