POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Compiling for numa parallel machines
Compiling for numa parallel machines
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The Omega Library interface guide
The Omega Library interface guide
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Non-singular data transformations: definition, validity and applications
ICS '97 Proceedings of the 11th international conference on Supercomputing
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Software controlled power management
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
System-level power optimization: techniques and tools
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
DSP Processors Hit the Mainstream
Computer
Reuse-Driven Tiling for Data Locality
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
An Overview of a Compiler for Scalable Parallel Machines
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
Reducing off-chip memory access via stream-conscious tiling on multimedia applications
International Journal of Parallel Programming
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Buffer and Register Allocation for Memory Space Optimization
Journal of VLSI Signal Processing Systems
A Systematic Approach to Automatically Generate Multiple Semantically Equivalent Program Versions
Ada-Europe '08 Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies
Hi-index | 0.00 |
Improving locality of data references is becoming increasingly important due to increasing gap between processor cycle times and off-chip memory access latencies. Improving data locality not only improves effective memory access time but also reduces memory system energy consumption due to data references. An optimizing compiler can play an important role in enhancing data locality in array-intensive embedded media applications with regular data access patterns.This paper presents a compiler-based data space-oriented tiling approach (DST). In this strategy, the data space (e.g., an array of signals) is logically divided into chunks (called data tiles) and each data tile is processed in turn. In processing a data tile, our approach traverses the entire iteration space of all nests in the code and executes all iterations (potentially coming from different nests) that access the data tile being processed. In doing so, it also takes data dependences into account. Since a data space is common across all nests that access it, DST can potentially achieve better results than traditional iteration space (loop) tiling by exploiting internest data locality.We also present an example application of DST for improving the effectiveness of a scratch pad memory (SPM) for data accesses. SPMs are alternatives to conventional cache memories in embedded computing world. These small on-chip memories, like caches, provide fast and low-power access to data; but, they differ from conventional data caches in that their contents are managed by compiler instead of hardware. We have implemented DST in a source-to-source translator and quantified its benefits using a simulator. Our preliminary results with several array-intensive applications and varying input sizes show that our approach outperforms classical iteration space-oriented tiling as well as a data-oriented approach that considers each nest in isolation.