Optimizing compilers for modern architectures: a dependence-based approach

Authors:
Ken Kennedy;John R. Allen
Affiliations:
-;-
Venue:
Optimizing compilers for modern architectures: a dependence-based approach
Year:
2001

Citing 0
Cited 326

Folklore confirmed: reducible flow graphs are exponentially larger

POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reducing Communication Cost for Parallelizing Irregular Scientific Codes

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A comparison of empirical and model-driven optimization

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
References

Sourcebook of parallel computing
Transforming Complex Loop Nests for Locality

The Journal of Supercomputing
What can we gain by unfolding loops?

ACM SIGPLAN Notices
Single Assignment C: efficient support for high-level array operations in a functional setting

Journal of Functional Programming
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Dynamically Tuned Sorting Library

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Automatic loop interchange

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Applications of storage mapping optimization to register promotion

Proceedings of the 18th annual international conference on Supercomputing
General loop fusion technique for nested loops considering timing and code size

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
The Energy Impact of Aggressive Loop Fusion

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Superword-Level Parallelism in the Presence of Control Flow

Proceedings of the international symposium on Code generation and optimization
The Challenges of Hardware Synthesis from C-Like Languages

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Functional Equivalence Checking for Verification of Algebraic Transformations on Array-Intensive Source Code

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
The Potential of Computation Regrouping for Improving Locality

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Automatic measurement of memory hierarchy parameters

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatic blocking of QR and LU factorizations for locality

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Scalarization using loop alignment and loop skewing

The Journal of Supercomputing
Efficient data driven run-time code generation

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Shared memory multiprocessor support for functional array processing in SAC

Journal of Functional Programming
Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs

ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
Architecture-aware classical Taylor shift by 1

Proceedings of the 2005 international symposium on Symbolic and algebraic computation
Computer Architecture: Challenges and Opportunities for the Next Decade

IEEE Micro
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion

International Journal of High Performance Computing Applications
Contributions to the GNU compiler collection

IBM Systems Journal
Lightweight reference affinity analysis

Proceedings of the 19th annual international conference on Supercomputing
Think globally, search locally

Proceedings of the 19th annual international conference on Supercomputing
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Vector Parallelism in Software Pipelined Loops

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Quantifying Locality In The Memory Access Patterns of HPC Applications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Translation and Run-Time Validation of Loop Transformations

Formal Methods in System Design
Automatic functional verification of memory oriented global source code transformations

HLDVT '03 Proceedings of the Eighth IEEE International Workshop on High-Level Design Validation and Test Workshop
A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality

Proceedings of the International Symposium on Code Generation and Optimization
Compiler Optimizations to Reduce Security Overhead

Proceedings of the International Symposium on Code Generation and Optimization
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Vector LLVA: a virtual vector instruction set for media processing

Proceedings of the 2nd international conference on Virtual execution environments
Parallelization of the data encryption standard(DES) algorithm

Enhanced methods in computer security, biometric and artificial intelligence systems
Power optimizations for the MLCA using dynamic voltage scaling

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Reuse analysis of indirectly indexed arrays

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Self-adapting numerical software (SANS) effort

IBM Journal of Research and Development
An empirical evaluation of chains of recurrences for array dependence testing

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A New Genetic Algorithm for Loop Tiling

The Journal of Supercomputing
In search of a program generator to implement generic transformations for high-performance computing

Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Syntax-driven implementation of software programming language control constructs and expressions on FPGAs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
SAC: a functional array language for efficient multi-threaded execution

International Journal of Parallel Programming
Violated dependence analysis

Proceedings of the 20th annual international conference on Supercomputing
Complete inlining of recursive calls: beyond tail-recursion elimination

Proceedings of the 44th annual Southeast regional conference
On minimizing materializations of array-valued temporaries

ACM Transactions on Programming Languages and Systems (TOPLAS)
Using fine grain multithreading for energy efficient computing

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop pipelining for high-throughput stream computation using self-timed rings

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Improving power efficiency with compiler-assisted cache replacement

Journal of Embedded Computing - Cache exploitation in embedded systems
Design space exploration of an optimized compiler approach for a generic reconfigurable array architecture

The Journal of Supercomputing
The rise and fall of High Performance Fortran: an historical object lesson

Proceedings of the third ACM SIGPLAN conference on History of programming languages
Improving locality for ODE solvers by program transformations

Scientific Programming
A unified evaluation framework for coarse grained reconfigurable array architectures

Proceedings of the 4th international conference on Computing frontiers
An experimental comparison of cache-oblivious and cache-conscious programs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Software behavior oriented parallelization

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A Dimension Abstraction Approach to Vectorization in Matlab

Proceedings of the International Symposium on Code Generation and Optimization
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Predicting locality phases for dynamic memory optimization

Journal of Parallel and Distributed Computing
VLIW instruction scheduling for minimal power variation

ACM Transactions on Architecture and Code Optimization (TACO)
Optimisation Validation

Electronic Notes in Theoretical Computer Science (ENTCS)
Locality optimization in wireless applications

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Code-size conscious pipelining of imperfectly nested loops

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Optimistic parallelism benefits from data partitioning

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallel-stage decoupled software pipelining

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Spice: speculative parallel iteration chunk execution

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Program optimization space pruning for a multithreaded gpu

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Compiling for vector-thread architectures

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Compiling for an indirect vector register architecture

Proceedings of the 5th conference on Computing frontiers
GPU acceleration of cutoff pair potentials for molecular modeling applications

Proceedings of the 5th conference on Computing frontiers
Optimized mapping for enchancing the operation parallelism in coarse-grained reconfigurable arrays

SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
Design of the Java HotSpot™ client compiler for Java 6

ACM Transactions on Architecture and Code Optimization (TACO)
Automatic SIMD vectorization of chains of recurrences

Proceedings of the 22nd annual international conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Reasoning about inherent parallelism in modern object-oriented languages

ACSC '08 Proceedings of the thirty-first Australasian conference on Computer science - Volume 74
The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
XARK: An extensible framework for automatic recognition of computational kernels

ACM Transactions on Programming Languages and Systems (TOPLAS)
Program optimization carving for GPU computing

Journal of Parallel and Distributed Computing
Exploiting Loop-Level Parallelism for SIMD Arrays Using OpenMP

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Finding Synchronization-Free Parallelism Represented with Trees of Dependent Operations

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
On Validity of Program Transformations in the Java Memory Model

ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Revisiting SIMD Programming

Languages and Compilers for Parallel Computing
Language Extensions in Support of Compiler Parallelization

Languages and Compilers for Parallel Computing
Exploiting SIMD Parallelism with the CGiS Compiler Framework

Languages and Compilers for Parallel Computing
Flow-Sensitive Loop-Variant Variable Classification in Linear Time

Languages and Compilers for Parallel Computing
Control flow optimization in loops using interval analysis

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications

Journal of Signal Processing Systems
Address Generation Optimization for Embedded High-Performance Processors: A Survey

Journal of Signal Processing Systems
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Redundancy elimination revisited

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Parallelizing scientific code with invasive interactive parallelization: a case study with reuseware

Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
On the implementation of automatic differentiation tools

Higher-Order and Symbolic Computation
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

Languages and Compilers for Parallel Computing
On the Scalability of an Automatically Parallelized Irregular Application

Languages and Compilers for Parallel Computing
Scalable Implementation of Efficient Locality Approximation

Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Software Pipelining in Nested Loops with Prolog-Epilog Merging

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Automatic Discovery of Coarse-Grained Parallelism in Media Applications

Transactions on High-Performance Embedded Architectures and Compilers I
Architecture-aware optimization targeting multithreaded stream computing

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Resource aware mapping on coarse grained reconfigurable arrays

Microprocessors & Microsystems
Design and implementation of a queue compiler

Microprocessors & Microsystems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays

The Journal of Supercomputing
Chunking parallel loops in the presence of synchronization

Proceedings of the 23rd international conference on Supercomputing
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Fast Track: A Software System for Speculative Program Optimization

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
On approximating the ideal random access machine by physical machines

Journal of the ACM (JACM)
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimistic parallelism requires abstractions

Communications of the ACM - The Status of the P versus NP Problem
A case for compiler-driven superpage allocation

Proceedings of the 47th Annual Southeast Regional Conference
Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Systematic search within an optimisation space based on Unified Transformation Framework

International Journal of Computational Science and Engineering
Automatic parallelization for graphics processing units

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software

Cluster Computing
Inferring Dataflow Properties of User Defined Table Processors

SAS '09 Proceedings of the 16th International Symposium on Static Analysis
Optimal loop parallelization for maximizing iteration-level parallelism

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Three fundamental dimensions of scientific workflow interoperability: Model of computation, language, and execution environment

Future Generation Computer Systems
The habanero multicore software research project

Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications
Extracting synchronization-free slices of operations in perfectly-nested loops

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Compact multi-dimensional kernel extraction for register tiling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A program auto-parallelizer based on the component technology of optimizing compiler construction

Programming and Computing Software
Parallel loop generation and scheduling

The Journal of Supercomputing
Automatic memory partitioning and scheduling for throughput and power optimization

Proceedings of the 2009 International Conference on Computer-Aided Design
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
MacroSS: macro-SIMDization of streaming applications

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
On minimizing register usage of linearly scheduled algorithms with uniform dependencies

Computer Languages, Systems and Structures
Allocation and scheduling of Conditional Task Graphs

Artificial Intelligence
Axis control in SAC

IFL'02 Proceedings of the 14th international conference on Implementation of functional languages
Dependence-based code generation for a CELL processor

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Exploiting speculative thread-level parallelism in data compression applications

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
New algorithms for SIMD alignment

CC'07 Proceedings of the 16th international conference on Compiler construction
Loop parallelization in multi-dimensional cartesian space

PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics
Integrating high-level optimizations in a production compiler: design and implementation experience

CC'03 Proceedings of the 12th international conference on Compiler construction
Decoupled software pipelining creates parallelization opportunities

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Exploiting statistical correlations for proactive prediction of program behaviors

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Data dependence analysis for the parallelization of numerical tree codes

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Hierarchical program representation for program element matching

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Auto-parallelisation of sieve C++ programs

Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
Speculative parallelization using state separation and multiple value prediction

Proceedings of the 2010 international symposium on Memory management
Experiences in initiating concurrency software research efforts

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Model-guided empirical tuning of loop fusion

International Journal of High Performance Systems Architecture
Transforming flow information during code optimization for timing analysis

Real-Time Systems
Mapping loop nests to multipipelined architecture

Programming and Computing Software
A profile-based tool for finding pipeline parallelism in sequential programs

Parallel Computing
Simple section interchange and properties of non-computable functions

Science of Computer Programming
Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Computation mapping for multi-level storage cache hierarchies

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Reducing task creation and termination overhead in explicitly parallel programs

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An OpenCL framework for heterogeneous multicores with local memory

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallel programming must be deterministic by default

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
An input-centric paradigm for program dynamic optimizations

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Dynamic parallelization of recursive code: part 1: managing control flow interactions with the continuator

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Parallel inclusion-based points-to analysis

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Inferring arbitrary distributions for data and computation

Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release Consistency

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Application of if-conversion to verification and optimization of workflows

Programming and Computing Software
Engineering scalable, cache and space efficient tries for strings

The VLDB Journal — The International Journal on Very Large Data Bases
Compilers, architectures and synthesis for embedded computing: retrospect and prospect

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Vectorization for Java

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Exposing tunable parameters in multi-threaded numerical code

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Scalable SMT-based verification of GPU kernel functions

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Extracting both affine and non-linear synchronization-free slices in program loops

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Automatic program parallelization for multicore processors

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Estimating and exploiting potential parallelism by source-level dependence profiling

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Source-to-source optimization of CUDA C for GPU accelerated cardiac cell modeling

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Register allocation with instruction scheduling for VLIW-architectures

Programming and Computing Software
A performance model for fine-grain accesses in UPC

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Design flow for optimizing performance in processor systems with on-chip coarse-grain reconfigurable logic

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
Automatic memory partitioning and scheduling for throughput and power optimization

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automatic Parallelization in a Binary Rewriter

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Speculative Parallelization on Commodity Clusters

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Efficient Selection of Vector Instructions Using Dynamic Programming

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
GLOpenCL: OpenCL support on hardware- and software-managed cache multicores

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Patterns for cache optimizations on multi-processor machines

Proceedings of the 2010 Workshop on Parallel Programming Patterns
Dynamic selection of implementation variants of sequential iterated runge-kutta methods with tile size sampling

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

Journal of Computational Physics
Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Lowering STM overhead with static analysis

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Loop Distribution and Fusion with Timing and Code Size Optimization

Journal of Signal Processing Systems
Importance of explicit vectorization for CPU and GPU software performance

Journal of Computational Physics
Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance

International Journal of High Performance Computing Applications
Exploiting the distributed foreground memory in coarse grain reconfigurable arrays for reducing the memory bottleneck in DSP applications

SSIP'05 Proceedings of the 5th WSEAS international conference on Signal, speech and image processing
Data layout transformation for stencil computations on short-vector SIMD architectures

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Commutative set: a language extension for implicit parallel programming

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
The tao of parallelism in algorithms

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parallelism orchestration using DoPE: the degree of parallelism executive

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic parallelization via matrix multiplication

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
ALTER: exploiting breakable dependences for parallelization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets

Proceedings of the international conference on Supercomputing
An approximate method for filtering out data dependencies with a sufficiently large distance between memory references

The Journal of Supercomputing
Efficient stack distance computation for priority replacement policies

Proceedings of the 8th ACM International Conference on Computing Frontiers
Understanding stencil code performance on multicore architectures

Proceedings of the 8th ACM International Conference on Computing Frontiers
A reuse-aware prefetching scheme for scratchpad memory

Proceedings of the 48th Design Automation Conference
Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications

Journal of Systems Architecture: the EUROMICRO Journal
An efficient time-step-based self-adaptive algorithm for predictor-corrector methods of Runge-Kutta type

Journal of Computational and Applied Mathematics
Symmetry-aware predicate abstraction for shared-variable concurrent programs

CAV'11 Proceedings of the 23rd international conference on Computer aided verification
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

The Journal of Supercomputing
Localizing globals and statics to make C programs thread-safe

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Probabilistically accurate program transformations

SAS'11 Proceedings of the 18th international conference on Static analysis
Safe parallel programming using dynamic dependence hints

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Enhancing locality for recursive traversals of recursive structures

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Combining measures for temporal and spatial locality

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Applying data copy to improve memory performance of general array computations

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Automatic measurement of instruction cache capacity

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Combined ILP and register tiling: analytical model and optimization framework

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Analytic models and empirical search: a hybrid approach to code optimization

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Parallelization of utility programs based on behavior phase analysis

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
PLDS: Partitioning linked data structures for parallelism

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Using machine learning to improve automatic vectorization

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A data locality methodology for matrix---matrix multiplication algorithm

The Journal of Supercomputing
Parallelisation of sequential programs by invasive composition and aspect weaving

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Programmable data dependencies and placements

DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation

Parallel Computing
Loop distribution and fusion with timing and code size optimization for embedded DSPs

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Induction variable analysis with delayed abstractions

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Combined loop transformation and hierarchy allocation for data reuse optimization

Proceedings of the International Conference on Computer-Aided Design
Massively parallel programming models used as hardware description languages: the OpenCL case

Proceedings of the International Conference on Computer-Aided Design
Limits of parallelism using dynamic dependency graphs

WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Generalized index-set splitting

CC'05 Proceedings of the 14th international conference on Compiler Construction
Verification of source code transformations by program equivalence checking

CC'05 Proceedings of the 14th international conference on Compiler Construction
Phase-Based miss rate prediction across program inputs

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Applying loop optimizations to object-oriented abstractions through general classification of array semantics

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Extending the applicability of scalar replacement to multiple induction variables

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A matrix-type for performance–portability

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Towards cache-optimized multigrid using patch-adaptive relaxation

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Automatic detection of saturation and clipping idioms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Efficient SIMD code generation for irregular kernels

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
SIMD defragmenter: efficient ILP realization on data-parallel architectures

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
WSQuery: XQuery for web services integration

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Performance and scalability analysis of cray x1 vectorization and multistreaming optimization

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
TVOC: a translation validator for optimizing compilers

CAV'05 Proceedings of the 17th international conference on Computer Aided Verification
Finding basic block and variable correspondence

SAS'05 Proceedings of the 12th international conference on Static Analysis
An algorithm of automatic workflow optimization

Programming and Computing Software
The polyhedral model is more widely applicable than you think

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Vapor SIMD: Auto-vectorize once, run everywhere

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Neighborhood-aware data locality optimization for NoC-based multicores

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Differential precondition checking: A lightweight, reusable analysis for refactoring tools

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Optimizing data shuffling in data-parallel computation by understanding user-defined functions

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
The HELIX project: overview and directions

Proceedings of the 49th Annual Design Automation Conference
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Proceedings of the 49th Annual Design Automation Conference
Scout: a source-to-source transformator for SIMD-Optimizations

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Automatic privatization for parallel execution of loops

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Parcae: a system for flexible parallel execution

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Dynamic trace-based analysis of vectorization potential of applications

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Logical inference techniques for loop parallelization

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hierarchical overlapped tiling

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Program analysis and transformation for holistic optimization of database applications

Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis
Automatic restructuring of GPU kernels for exploiting inter-thread data locality

CC'12 Proceedings of the 21st international conference on Compiler Construction
Fast loop-level data dependence profiling

Proceedings of the 26th ACM international conference on Supercomputing
Performance analysis of Intel multiprocessors using astrophysics simulations

Concurrency and Computation: Practice & Experience
Delta Send-Recv for Dynamic Pipelining in MPI Programs

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Free scheduling for statement instances of parameterized arbitrarily nested affine loops

Parallel Computing
Parallelization of the discrete chaotic block encryption algorithm

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Financial software on GPUs: between Haskell and Fortran

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Static detection of loop-invariant data structures

ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
From sequential programming to flexible parallel execution

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Automatically enhancing locality for tree traversals with traversal splicing

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Spotting code optimizations in data-parallel pipelines through PeriSCOPE

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Improving Data Locality for Efficient In-Core Path Tracing

Computer Graphics Forum
Compiler-in-the-loop exploration during datapath synthesis for higher quality delay-area trade-offs

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Model-based testing of optimizing compilers

TestCom'07/FATES'07 Proceedings of the 19th IFIP TC6/WG6.1 international conference, and 7th international conference on Testing of Software and Communicating Systems
Delayed side-effects ease multi-core programming

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Locality optimized shared-memory implementations of iterated runge-kutta methods

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems

Concurrency and Computation: Practice & Experience
Layout-oblivious compiler optimization for matrix computations

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Improved loop tiling based on the removal of spurious false dependences

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral parallel code generation for CUDA

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Exact dependence analysis for increased communication overlap

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Memory partitioning and scheduling co-optimization in behavioral synthesis

Proceedings of the International Conference on Computer-Aided Design
From relational verification to SIMD loop synthesis

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Transformation Framework for Optimizing Task-Parallel Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
HOTL: a higher order theory of locality

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Practical automatic loop specialization

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Parallel execution of Java loops on Graphics Processing Units

Science of Computer Programming
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Memory reuse optimizations in the R-Stream compiler

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
When polyhedral transformations meet SIMD code generation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Pacman: program-assisted cache management

Proceedings of the 2013 international symposium on memory management
Computational caches

Proceedings of the 6th International Systems and Storage Conference
Throughput-oriented kernel porting onto FPGAs

Proceedings of the 50th Annual Design Automation Conference
Runtime dependency analysis for loop pipelining in high-level synthesis

Proceedings of the 50th Annual Design Automation Conference
Fast condensation of the program dependence graph

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
A survey of pipelined workflow scheduling: Models and algorithms

ACM Computing Surveys (CSUR)
A T2 graph-reduction approach to fusion

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Semi-automatic restructuring of offloadable tasks for many-core accelerators

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient compilation of CUDA kernels for high-performance computing on FPGAs

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Automatic OpenCL work-group size selection for multicore CPUs

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Multifrontal QR factorization for multicore architectures over runtime systems

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
An automatic thread decomposition approach for pipelined multithreading

International Journal of High Performance Computing and Networking
ASC: automatically scalable computation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Non-affine Extensions to Polyhedral Code Generation

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Alias control for deterministic parallelism

Aliasing in Object-Oriented Programming
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)
The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation

International Journal of Parallel Programming
Recovering memory access patterns of executable programs

Science of Computer Programming
Optimal eviction policies for stochastic address traces

Theoretical Computer Science
High level transforms for SIMD and low-level computer vision algorithms

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)
Accelerating sequential programs on commodity multi-core processors

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Modern computer architectures designed with high-performance microprocessors offer tremendous potential gains in performance over previous designs. Yet their very complexity makes it increasingly difficult to produce efficient code and to realize their full potential. This landmark text from two leaders in the field focuses on the pivotal role that compilers can play in addressing this critical issue. The basis for all the methods presented in this book is data dependence, a fundamental compiler analysis tool for optimizing programs on high-performance microprocessors and parallel architectures. It enables compiler designers to write compilers that automatically transform simple, sequential programs into forms that can exploit special features of these modern architectures. The text provides a broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling. The authors demonstrate the importance and wide applicability of dependence-based compiler optimizations and give the compiler writer the basics needed to understand and implement them. They also offer cookbook explanations for transforming applications by hand to computational scientists and engineers who are driven to obtain the best possible performance of their complex applications.