Increasing and Detecting Memory Address Congruence

Authors:
Samuel Larsen;Emmett Witchel;Saman P. Amarasinghe
Affiliations:
-;-;-
Venue:
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Year:
2002

Citing 13
Cited 27

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Introduction to algorithms

Introduction to algorithms
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Memory access coalescing: a technique for eliminating redundant memory accesses

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Automatic loop interchange

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Direct addressed caches for reduced power consumption

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture

Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Superword-Level Parallelism in the Presence of Control Flow

Proceedings of the international symposium on Code generation and optimization
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion

Proceedings of the international symposium on Code generation and optimization
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Generation of permutations for SIMD processors

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Exploiting Vector Parallelism in Software Pipelined Loops

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multi-platform Auto-vectorization

Proceedings of the International Symposium on Code Generation and Optimization
Optimizing data permutations for SIMD devices

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Vector LLVA: a virtual vector instruction set for media processing

Proceedings of the 2nd international conference on Virtual execution environments
Compiler-managed partitioned data caches for low power

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A SIMD optimization framework for retargetable compilers

ACM Transactions on Architecture and Code Optimization (TACO)
A case study on compiler optimizations for the Intel® Core™ 2 duo processor

International Journal of Parallel Programming
Grids: a domain for analyzing the distribution of numerical values

LOPSTR'06 Proceedings of the 16th international conference on Logic-based program synthesis and transformation
New algorithms for SIMD alignment

CC'07 Proceedings of the 16th international conference on Compiler construction
Efficient Selection of Vector Instructions Using Dynamic Programming

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Data layout transformation for stencil computations on short-vector SIMD architectures

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Using machine learning to improve automatic vectorization

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
SIMD defragmenter: efficient ILP realization on data-parallel architectures

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Vapor SIMD: Auto-vectorize once, run everywhere

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

Parallel Computing
Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A static memory reference exhibits a unique property when its dynamic memory addresses are congruent with respect to some non-trivial modulus. Extraction of this congruence information at compile-time enables new classes of program optimization. In this paper, we present methods for forcing congruence among the dynamic addresses of a memory reference. We also introduce a compiler algorithm for detecting this property. Our transformations do not require interprocedural analysis and introduce almostno overhead. As a result, they can be incorporated into real compilation systems.On average, our transformations are able to achieve a five-fold increase in the number of congruent memory operations. We are then able to detect 95% of these references. This success is invaluable in providing performance gains in a variety of areas. When congruence information is incorporated into a vectorizing compiler, we can increase the performance of a G4 AltiVec processor up to a factor of two. Using the same methods, we are able to reduce energy consumption in a data cache by as much as 35%.