Network-based heuristics for constraint-satisfaction problems
Artificial Intelligence
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Compiling for numa parallel machines
Compiling for numa parallel machines
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Automatic storage management for parallel programs
Parallel Computing - Special issues on languages and compilers for parallel computers
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Exact memory size estimation for array computations without loop unrolling
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A Glimpse of Constraint Satisfaction
Artificial Intelligence Review
Nonsingular Data Transformations: Definition, Validity, and Applications
International Journal of Parallel Programming
Cache conscious data layout organization for embedded multimedia applications
Proceedings of the conference on Design, automation and test in Europe
Parallelizing DSP nested loops on reconfigurable architectures using data context switching
Proceedings of the 38th annual Design Automation Conference
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
An energy saving strategy based on adaptive loop parallelization
Proceedings of the 39th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Storage Management Programmable Process
Storage Management Programmable Process
Integrating loop and data transformations for global optimization
Journal of Parallel and Distributed Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Memory Reuse Analysis in the Polyhedral Model
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Locality Optimizations for Parallel Machines
CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
A GSA-based compiler infrastructure to extract parallelism from complex loops
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Automatic Loop Parallelization: An Abstract Interpretation Approach
PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
An Approach to Parallelizing Non-Uniform Loops with the Omega Calculator
PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
Memory Architecture Exploration for Programmable Embedded Systems
Memory Architecture Exploration for Programmable Embedded Systems
Compiler Techniques for the Distribution of Data and Computation
IEEE Transactions on Parallel and Distributed Systems
Automatic parallel code generation for tiled nested loops
Proceedings of the 2004 ACM symposium on Applied computing
Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th Edition)
Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th Edition)
Optimizing the memory bandwidth with loop fusion
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
IEEE Transactions on Computers
A Constraint Network Based Approach to Memory Layout Optimization
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
ACME: adaptive compilation made efficient
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
Integrating loop and data optimizations for locality within a constraint network based framework
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Effective automatic parallelization and locality optimization using the polyhedral model
Effective automatic parallelization and locality optimization using the polyhedral model
Improving the memory bandwidth utilization using loop transformations
PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Hi-index | 0.00 |
Embedded applications are becoming increasingly complex and processing ever-increasing datasets. In the context of data-intensive embedded applications, there have been two complementary approaches to enhancing application behavior, namely, data locality optimizations and improving loop-level parallelism. Data locality needs to be enhanced to maximize the number of data accesses satisfied from the higher levels of the memory hierarchy. On the other hand, compiler-based code parallelization schemes require a fresh look for chip multiprocessors as interprocessor communication is much cheaper than off-chip memory accesses. Therefore, a compiler needs to minimize the number of off-chip memory accesses. This can be achieved by considering multiple loop nests simultaneously. Although compilers address these two problems, there is an inherent difficulty in optimizing both data locality and parallelism simultaneously. Therefore, an integrated approach that combines these two can generate much better results than each individual approach. Based on these observations, this paper proposes a constraint network (CN)-based formulation for data locality optimization and code parallelization. The paper also presents experimental evidence, demonstrating the success of the proposed approach, and compares our results with those obtained through previously proposed approaches. The experiments from our implementation indicate that the proposed approach is very effective in enhancing data locality and parallelization.