Data space-oriented tiling for enhancing locality

Authors:
I. Kadayif;M. Kandemir
Affiliations:
The Pennsylvania State University, University Park, PA;The Pennsylvania State University, University Park, PA
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2005

Citing 20
Cited 5

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Shade: a fast instruction-set simulator for execution profiling

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Compiling for numa parallel machines

Compiling for numa parallel machines
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The Omega Library interface guide

The Omega Library interface guide
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Non-singular data transformations: definition, validity and applications

ICS '97 Proceedings of the 11th international conference on Supercomputing
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Software controlled power management

CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
System-level power optimization: techniques and tools

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
DSP Processors Hit the Mainstream

Computer
Increasing Energy Efficiency of Embedded Systems by Application-Specific Memory Hierarchy Generation

IEEE Design & Test
Reuse-Driven Tiling for Data Locality

LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
An Overview of a Compiler for Scalable Parallel Machines

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test

Intermediately executed code is the key to find refactorings that improve temporal data locality

Proceedings of the 3rd conference on Computing frontiers
Reducing off-chip memory access via stream-conscious tiling on multimedia applications

International Journal of Parallel Programming
MPSoC memory optimization using program transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Buffer and Register Allocation for Memory Space Optimization

Journal of VLSI Signal Processing Systems
A Systematic Approach to Automatically Generate Multiple Semantically Equivalent Program Versions

Ada-Europe '08 Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Improving locality of data references is becoming increasingly important due to increasing gap between processor cycle times and off-chip memory access latencies. Improving data locality not only improves effective memory access time but also reduces memory system energy consumption due to data references. An optimizing compiler can play an important role in enhancing data locality in array-intensive embedded media applications with regular data access patterns.This paper presents a compiler-based data space-oriented tiling approach (DST). In this strategy, the data space (e.g., an array of signals) is logically divided into chunks (called data tiles) and each data tile is processed in turn. In processing a data tile, our approach traverses the entire iteration space of all nests in the code and executes all iterations (potentially coming from different nests) that access the data tile being processed. In doing so, it also takes data dependences into account. Since a data space is common across all nests that access it, DST can potentially achieve better results than traditional iteration space (loop) tiling by exploiting internest data locality.We also present an example application of DST for improving the effectiveness of a scratch pad memory (SPM) for data accesses. SPMs are alternatives to conventional cache memories in embedded computing world. These small on-chip memories, like caches, provide fast and low-power access to data; but, they differ from conventional data caches in that their contents are managed by compiler instead of hardware. We have implemented DST in a source-to-source translator and quantified its benefits using a simulator. Our preliminary results with several array-intensive applications and varying input sizes show that our approach outperforms classical iteration space-oriented tiling as well as a data-oriented approach that considers each nest in isolation.