Data locality and parallelism optimization using a constraint-based approach

Authors:
Ozcan Ozturk
Affiliations:
-
Venue:
Journal of Parallel and Distributed Computing
Year:
2011

Citing 50
Cited 0

Network-based heuristics for constraint-satisfaction problems

Artificial Intelligence
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Compiling for numa parallel machines

Compiling for numa parallel machines
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
Automatic storage management for parallel programs

Parallel Computing - Special issues on languages and compilers for parallel computers
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Schedule-independent storage mapping for loops

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Exact memory size estimation for array computations without loop unrolling

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A Glimpse of Constraint Satisfaction

Artificial Intelligence Review
Nonsingular Data Transformations: Definition, Validity, and Applications

International Journal of Parallel Programming
Cache conscious data layout organization for embedded multimedia applications

Proceedings of the conference on Design, automation and test in Europe
Parallelizing DSP nested loops on reconfigurable architectures using data context switching

Proceedings of the 38th annual Design Automation Conference
A unified framework for schedule and storage optimization

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
An energy saving strategy based on adaptive loop parallelization

Proceedings of the 39th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Storage Management Programmable Process

Storage Management Programmable Process
Integrating loop and data transformations for global optimization

Journal of Parallel and Distributed Computing
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Memory Reuse Analysis in the Polyhedral Model

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Locality Optimizations for Parallel Machines

CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
A GSA-based compiler infrastructure to extract parallelism from complex loops

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Automatic Loop Parallelization: An Abstract Interpretation Approach

PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
An Approach to Parallelizing Non-Uniform Loops with the Omega Calculator

PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
Memory Architecture Exploration for Programmable Embedded Systems

Memory Architecture Exploration for Programmable Embedded Systems
Compiler Techniques for the Distribution of Data and Computation

IEEE Transactions on Parallel and Distributed Systems
Automatic parallel code generation for tiled nested loops

Proceedings of the 2004 ACM symposium on Applied computing
Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th Edition)

Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th Edition)
Optimizing the memory bandwidth with loop fusion

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Cache Conscious Data Layout Organization for Conflict Miss Reduction in Embedded Multimedia Applications

IEEE Transactions on Computers
A Constraint Network Based Approach to Memory Layout Optimization

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
ACME: adaptive compilation made efficient

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Integrating loop and data optimizations for locality within a constraint network based framework

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Effective automatic parallelization and locality optimization using the polyhedral model

Effective automatic parallelization and locality optimization using the polyhedral model
Improving the memory bandwidth utilization using loop transformations

PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Embedded applications are becoming increasingly complex and processing ever-increasing datasets. In the context of data-intensive embedded applications, there have been two complementary approaches to enhancing application behavior, namely, data locality optimizations and improving loop-level parallelism. Data locality needs to be enhanced to maximize the number of data accesses satisfied from the higher levels of the memory hierarchy. On the other hand, compiler-based code parallelization schemes require a fresh look for chip multiprocessors as interprocessor communication is much cheaper than off-chip memory accesses. Therefore, a compiler needs to minimize the number of off-chip memory accesses. This can be achieved by considering multiple loop nests simultaneously. Although compilers address these two problems, there is an inherent difficulty in optimizing both data locality and parallelism simultaneously. Therefore, an integrated approach that combines these two can generate much better results than each individual approach. Based on these observations, this paper proposes a constraint network (CN)-based formulation for data locality optimization and code parallelization. The paper also presents experimental evidence, demonstrating the success of the proposed approach, and compares our results with those obtained through previously proposed approaches. The experiments from our implementation indicate that the proposed approach is very effective in enhancing data locality and parallelization.