Multi-compilation: capturing interactions among concurrently-executing applications

Authors:
Ozcan Ozturk;Guangyu Chen;Mahmut Kandemir
Affiliations:
Pennsylvania State University, University Park, PA;Pennsylvania State University, University Park, PA;Pennsylvania State University, University Park, PA
Venue:
Proceedings of the 3rd conference on Computing frontiers
Year:
2006

Citing 19
Cited 1

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Compiling for numa parallel machines

Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Advanced compiler design and implementation

Advanced compiler design and implementation
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Precise miss analysis for program transformations with caches of arbitrary associativity

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Loop fusion for memory space optimization

Proceedings of the 14th international symposium on Systems synthesis
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Data remapping for design space optimization of embedded memory systems

ACM Transactions on Embedded Computing Systems (TECS)
Memory layout techniques for variables utilizing efficient DRAM access modes in embedded system design

Proceedings of the 40th annual Design Automation Conference
A Matrix-Based Approach to the Global Locality Optimization Problem

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Integrating Loop and Data Transformations for Global Optimisation

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Strategies for Improving Data Locality in Embedded Applications

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Inter-program optimizations for conserving disk energy

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design

Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is well known that while applying a compiler optimization to a large scope of code (e.g., an entire procedure or function) can bring larger benefits in return as compared to smaller scopes (e.g., a nested loop), code analysis and optimization at larger scopes are also more difficult to manage. As of today, the largest scope for a compiler optimization is an entire program source. However, as embedded chip multiprocessor architectures are finding their ways into commercial products, it is becoming important to consider the scenario of multiple applications executing on the same chip multiprocessor. This paper explores a novel technique called multi-compilation where multiple applications that are expected to be executed simultaneously on the same CMP (chip multiprocessor) are compiled together. The benefits of this approach include capturing the interactions amongst applications due to data sharing. While one can think of many potential optimizations that can work in an inter-application fashion exploiting data sharing across applications, we restrict ourselves in this paper to data layout optimization, which is the problem of determining the most suitable memory layout for array data. To demonstrate the impact of our contribution, we implemented our approach and performed a simulation-based study with several embedded applications. Our experimental results show that, by selecting the memory layouts of data arrays considering multiple applications at the same time, we can reduce cache misses by 18.7% and execution cycles by 13.1% on average.