High Performance Compilers for Parallel Computing

Authors:
Michael Joseph Wolfe;Carter Shanklin;Leda Ortega
Affiliations:
-;-;-
Venue:
High Performance Compilers for Parallel Computing
Year:
1995

Citing 0
Cited 512

Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Parallel architectures

ACM Computing Surveys (CSUR)
Parallelizing compilers

ACM Computing Surveys (CSUR)
Tuning the performance of I/O-intensive parallel applications

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems

IEEE Transactions on Parallel and Distributed Systems
Data-localization for Fortran macro-dataflow computation using partial static task assignment

ICS '96 Proceedings of the 10th international conference on Supercomputing
Eliminating redundant barrier synchronizations in rule-based programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
A new algorithm for partial redundancy elimination based on SSA form

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Compiler and run-time support for semi-structured applications

ICS '97 Proceedings of the 11th international conference on Supercomputing
A compiler algorithm for optimizing locality in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Compile-time minimisation of load imbalance in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Resource sharing in hierarchical synthesis

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
An approach for exploring code improving transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Design space exploration algorithm for heterogeneous multi-processor embedded system design

DAC '98 Proceedings of the 35th annual Design Automation Conference
Register promotion by sparse partial redundancy elimination of loads and stores

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The implementation and evaluation of fusion and contraction in array languages

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Loop fusion in high performance Fortran

ICS '98 Proceedings of the 12th international conference on Supercomputing
A general algorithm for tiling the register level

ICS '98 Proceedings of the 12th international conference on Supercomputing
The infinity Lambda test

ICS '98 Proceedings of the 12th international conference on Supercomputing
Dependence driven execution for multiprogrammed multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory size estimation for multimedia applications

Proceedings of the 6th international workshop on Hardware/software codesign
On the Removal of Anti- and Output-Dependences

International Journal of Parallel Programming
Initial Results for Glacial Variable Analysis

International Journal of Parallel Programming
Using interval arithmetic the calculate data sizes for compilation to multimedia instruction sets

MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Schedule-independent storage mapping for loops

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving Cache Locality by a Combination of Loop and Data Transformations

IEEE Transactions on Computers - Special issue on cache memory and related problems
A coordination language for mixed task and and data parallel programs

Proceedings of the 1999 ACM symposium on Applied computing
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
Synthesizing Efficient Out-of-Core Programs for Block Recursive Algorithms Using Block-Cyclic Data Distributions

IEEE Transactions on Parallel and Distributed Systems
Code motion for explicitly parallel programs

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Static single assignment form for machine code

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
High-level semantic optimization of numerical codes

ICS '99 Proceedings of the 13th international conference on Supercomputing
An experimental evaluation of tiling and shackling for memory hierarchy management

ICS '99 Proceedings of the 13th international conference on Supercomputing
An integer linear programming approach for optimizing cache locality

ICS '99 Proceedings of the 13th international conference on Supercomputing
Selecting tile shape for minimal execution time

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
An automated temporal partitioning and loop fission approach for FPGA based reconfigurable synthesis of DSP applications

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Accelerating APL programs with SAC

Proceedings of the conference on APL '99 : On track to the 21st century: On track to the 21st century
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Partial redundancy elimination in SSA form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Nonsingular Data Transformations: Definition, Validity, and Applications

International Journal of Parallel Programming
On defining application-specific high-level array operations by means of shape-invariant programming facilities

APL '98 Proceedings of the APL98 conference on Array processing language
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time

IEEE Transactions on Computers
A global communication optimization technique based on data-flow analysis and linear algebra

ACM Transactions on Programming Languages and Systems (TOPLAS)
A case for source-level transformations in MATLAB

Proceedings of the 2nd conference on Domain-specific languages
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Proceedings of the 14th international conference on Supercomputing
ZPL: A Machine Independent Programming Language for Parallel Computers

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Influence of compiler optimizations on system power

Proceedings of the 37th Annual Design Automation Conference
A Transformation Approach to Derive Efficient Parallel Implementations

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Timing Analysis for Data and Wrap-Around Fill Caches

Real-Time Systems
Energy-driven integrated hardware-software optimizations using SimplePower

Proceedings of the 27th annual international symposium on Computer architecture
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers

The Journal of Supercomputing
A Loop Transformation Algorithm for Communication Overlapping

International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
An integrated temporal partioning and partial reconfiguration technique for design latency improvement

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Design-Space Exploration for Block-Processing Based TemporalPartitioning of Run-Time Reconfigurable Systems

Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Memory system energy (poster session): influence of hardware-software optimizations

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Supporting Timing Analysis by Automatic Bounding of LoopIterations

Real-Time Systems - Special issue on worst-case execution-time analysis
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP

IEEE Transactions on Parallel and Distributed Systems
Properties and Algorithms for Unfolding of Probabilistic Data-Flow Graphs

Journal of VLSI Signal Processing Systems
A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations

IEEE Transactions on Parallel and Distributed Systems
Minimizing Data and Synchronization Costs in One-Way Communication

IEEE Transactions on Parallel and Distributed Systems
A compiler technique for improving whole-program locality

POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Matching and searching analysis for parallel hardware implementation on FPGAs

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Optimizing memory usage in the polyhedral model

ACM Transactions on Programming Languages and Systems (TOPLAS)
Transformations for imperfectly nested loops

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Tiling imperfectly-nested loop nests

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Compiler-directed selection of dynamic memory layouts

Proceedings of the ninth international symposium on Hardware/software codesign
Exploiting non-uniform reuse for cache optimization

Proceedings of the 2001 ACM symposium on Applied computing
A dynamic locality optimization algorithm for linear algebra codes

Proceedings of the 2001 ACM symposium on Applied computing
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Fractal symbolic analysis

ICS '01 Proceedings of the 15th international conference on Supercomputing
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop optimization for a class of memory-constrained computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimizing locality for ODE solvers

ICS '01 Proceedings of the 15th international conference on Supercomputing
Global optimization techniques for automatic parallelization of hybrid applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Static Single Assignment Form for Message-Passing Programs

International Journal of Parallel Programming
Reducing memory requirements of nested loops for embedded systems

Proceedings of the 38th annual Design Automation Conference
Computational power of pipelined memory hierarchies

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Fast bit-true simulation

Proceedings of the 38th annual Design Automation Conference
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Loop parallelization algorithms

Compiler optimizations for scalable parallel systems
Interprocedural analysis based on guarded array regions

Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops

Compiler optimizations for scalable parallel systems
Solving alignment using elementary linear algebra

Compiler optimizations for scalable parallel systems
Compiler support for block buffering

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The NINJA project

Communications of the ACM
Morphable Cache Architectures: Potential Benefits

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Source code transformation based on software cost analysis

Proceedings of the 14th international symposium on Systems synthesis
The Efficient Computation of Ownership Sets in HPF

IEEE Transactions on Parallel and Distributed Systems
A novel approach to code analysis of digital signal processing systems

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Influence of compiler optimizations on system power

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
Static and Dynamic Locality Optimizations Using Integer Linear Programming

IEEE Transactions on Parallel and Distributed Systems
Automatic Compilation of Loops to Exploit Operator Parallelism on Configurable Arithmetic Logic Units

IEEE Transactions on Parallel and Distributed Systems
Data Relation Vectors: A New Abstraction for Data Optimizations

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Hardware and Software Techniques for Controlling DRAM Power Modes

IEEE Transactions on Computers
On optimal temporal locality of stencil codes

Proceedings of the 2002 ACM symposium on Applied computing
Tera hardware-software cooperation

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Loop re-ordering and pre-fetching at run-time

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Automatic data and computation decomposition on distributed memory parallel computers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Mapping a Single Assignment Programming Language to Reconfigurable Systems

The Journal of Supercomputing
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

The Journal of Supercomputing
Energy-conscious compilation based on voltage scaling

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
An energy saving strategy based on adaptive loop parallelization

Proceedings of the 39th annual Design Automation Conference
Exploiting shared scratch pad memory space in embedded multiprocessor systems

Proceedings of the 39th annual Design Automation Conference
Exploiting operation level parallelism through dynamically reconfigurable datapaths

Proceedings of the 39th annual Design Automation Conference
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
An integer linear programming based approach for parallelizing applications in On-chip multiprocessors

Proceedings of the 39th annual Design Automation Conference
Optimal organizations for pipelined hierarchical memories

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

International Journal of Parallel Programming
Communication-Free Alignment for Array References with Linear Subscripts in Three Loop Index Variables or Quadratic Subscripts

The Journal of Supercomputing
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory Design and Exploration for Low Power, Embedded Systems

Journal of VLSI Signal Processing Systems - Special issue on signal processing systems design and implementation
Automated design synthesis and partitioning for adaptive reconfigurable hardware

Hardware implementation of intelligent systems
Optimizing inter-nest data locality

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Increasing temporal locality with skewing and recursive blocking

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Automatic intra-register vectorization for the Intel architecture

International Journal of Parallel Programming
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets

The Journal of Supercomputing
Precise Data Locality Optimization of Nested Loops

The Journal of Supercomputing
Parallel symbolic computation in ACE

Annals of Mathematics and Artificial Intelligence
Correctness properties in a shared-memory parallel language

Journal of the ACM (JACM)
Improving memory energy using access pattern classification

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors

The Journal of Supercomputing
Processor Array Synthesis from Shift-Variant Deep Nested Do Loops

The Journal of Supercomputing
Unified Interprocedural Parallelism Detection

International Journal of Parallel Programming
Compilation Techniques for Multimedia Processors

International Journal of Parallel Programming
A Vectorizing Compiler for Multimedia Extensions

International Journal of Parallel Programming
Path Analysis and Renaming for Predicated Instruction Scheduling

International Journal of Parallel Programming
Combining Loop Transformations Considering Caches and Scheduling

International Journal of Parallel Programming
Reuse-Driven Tiling for Improving Data Locality

International Journal of Parallel Programming
Data-Centric Transformations for Locality Enhancement

International Journal of Parallel Programming
Automatic Intra-Register Vectorization for the Intel® Architecture

International Journal of Parallel Programming
Code Transformations for Data Transfer and Storage Exploration Preprocessing in Multimedia Processors

IEEE Design & Test
A Layout-Conscious Iteration Space Transformation Technique

IEEE Transactions on Computers
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors

IEEE Transactions on Computers
A Parallelization Domain Oriented Multilevel Graph Partitioner

IEEE Transactions on Computers
Generation of Injective and Reversible Modular Mappings

IEEE Transactions on Parallel and Distributed Systems
Parallelizing graph construction operations in programs with cyclic graphs

Parallel Computing
Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers

Parallel Computing
Evaluating Integrated Hardware-Software Optimizations Using a Unified Energy Estimation Framework

IEEE Transactions on Computers
Parallel multiplication of a vector by a kronecker product of matrices

Parallel numerical linear algebra
Data Space Oriented Tiling

ESOP '02 Proceedings of the 11th European Symposium on Programming Languages and Systems
Rescheduling for Locality in Sparse Matrix Computations

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Correction Method for Parallel Loop Execution

ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Automatic generation of injective modular mappings

ICPP '97 Proceedings of the international Conference on Parallel Processing
Improving the Performance of Out-of-Core Computations

ICPP '97 Proceedings of the international Conference on Parallel Processing
Sassy: A Language and Optimizing Compiler for Image Processing on Reconfigurable Computing Systems

ICVS '99 Proceedings of the First International Conference on Computer Vision Systems
Mapping Techniques for Parallel Evaluation of Chains of Recurrences

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Compiler-Directed I/O Optimization

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Compile-Time Partitioning Strategy for Non-Rectangular Loop Nests

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Next Generation System Software for Future High-End Computing Systems

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Pipelining Wavefront Computations: Experiences and Performance

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
I/O Granularity Transformations

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Automatic Analysis of Loops to Exploit Operator Parallelism on Reconfigurable Systems

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
The I+ Test

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
The Access Region Test

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
An Analytical Comparison of the I-Test and Omega Test

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Language Support for Pipelining Wavefront Computations

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Compiler Framework for Tiling Imperfectly-Nested Loops

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Designing the Agassiz Compiler for Concurrent Multithreaded Architectures

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Comparative Analysis of Dependence Testing Mechanisms

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Experimental Evaluation of Energy Behavior of Iteration Space Tiling

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Compiler-Directed Dynamic Frequency and Voltage Scheduling

PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
A Technique for Parallel Loop Execution

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Software Bubbles: Using Predication to Compensate for Aliasing in Software Pipelines

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Interprocedural Transformations for Extracting Maximum Parallelism

ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Coarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Generation of Distributed Loop Control

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Complexity of Multi-dimensional Loop Alignment

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Parameter-Induced Aliasing in Ada

Ada Europe '01 Proceedings of the 6th Ade-Europe International Conference Leuven on Reliable Software Technologies
Left-Looking to Right-Looking and Vice Versa: An Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Temporary Arrays for Distribution of Loops with Control Dependences

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Data Sequence Locality: A Generalization of Temporal Locality

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Heterogeneous Clustered Processors: Organisation and Design

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
I/O-Conscious Tiling for Disk-Resident Data Sets

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Scheduling Iterative Programs onto LogP-Machine

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Automatic SIMD Parallelization of Embedded Applications Based on Pattern Recognition

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Verification of Basic Block Schedules Using RTL Transformations

CHARME '01 Proceedings of the 11th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Enhancing Compiler Techniques for Memory Energy Optimizations

EMSOFT '02 Proceedings of the Second International Conference on Embedded Software
Towards Energy-Aware Iteration Space Tiling

LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
A Holistic Approach to System Level Energy Optimization

PATMOS '00 Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation
Using Cohort-Scheduling to Enhance Server Performance

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
A Framework for Loop Distribution on Limited On-Chip Memory Processors

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Advanced Scalarization of Array Syntax

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Automatic Removal of Array Memory Leaks in Java

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Software Pipelining of Nested Loops

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Efficient Symbolic Analysis for Optimizing Compilers

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems

CC '02 Proceedings of the 11th International Conference on Compiler Construction
A Case Study: Effects of WITH-Loop-Folding on the NAS Benchmark MG in SAC

IFL '98 Selected Papers from the 10th International Workshop on 10th International Workshop
Improving Locality in Out-of-Core Computations Using Data Layout Transformations

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Static Analysis for Guarded Code

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
High Level Programming Methodologies for Data Intensive Computations

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Optimizing Mutual Exclusion Synchronization in Explicitly Parallel Programs

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
An Efficient Technique of Instruction Scheduling on a Superscalar-Based Mulprocessor

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Generation of distributed loop control

Embedded processor design challenges
Algorithms for computing the static single assignment form

Journal of the ACM (JACM)
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework

IEEE Transactions on Parallel and Distributed Systems
Overlap of computation and communication on shared-memory networks-of-workstations

Cluster computing
QR factorization for shared memory and message passing

Parallel Computing
Dynamic compilation for energy adaptation

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Locality-conscious process scheduling in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Compiler-directed instruction cache leakage optimization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Interprocedural optimizations for improving data cache performance of array-intensive embedded applications

Proceedings of the 40th annual Design Automation Conference
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing

The Journal of Supercomputing
A compiler approach for reducing data cache energy

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A GSA-based compiler infrastructure to extract parallelism from complex loops

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Compiler optimizations for low power systems

Power aware computing
Address code generation for DSP instruction-set architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cone Based Clustering for List Scheduling Algorithms

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Pipeline Vectorization for Reconfigurable Systems

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Automatic Synthesis of Data Storage and Control Structures for FPGA-Based Computing Engines

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
An interprocedural framework for determining efficient data redistributions in distributed memory machines

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
A transformation method to reduce loop overhead in HPF compiler

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Detection of Implicit Parallelisms in the Task Parallel Language

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A New Transformation Method to Generate Optimized DO Loop from FORALL Construct

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Compiler-Directed Array Interleaving for Reducing Energy in Multi-Bank Memories

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Strategies for Improving Data Locality in Embedded Applications

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Identifying parallelism in programs with cyclic graphs

Journal of Parallel and Distributed Computing
Extracting Parallelism in Nested Loops

COMPSAC '96 Proceedings of the 20th Conference on Computer Software and Applications
The Generalized Lambda Test

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

IEEE Transactions on Computers
Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fractal symbolic analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Programming skills for a changing world: back to the basics

Journal of Computing Sciences in Colleges
Vectorizing for a SIMdD DSP architecture

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A scalable wide-issue clustered VLIW with a reconfigurable interconnect

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Exploiting bank locality in multi-bank memories

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications

IEEE Transactions on Computers
Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References

IEEE Transactions on Parallel and Distributed Systems
A Quantitative Analysis of Tile Size Selection Algorithms

The Journal of Supercomputing
Parallel Processing of First Order Linear Recurrence on SMP Machines

The Journal of Supercomputing
What can we gain by unfolding loops?

ACM SIGPLAN Notices
Single Assignment C: efficient support for high-level array operations in a functional setting

Journal of Functional Programming
Impact of Data Transformations on Memory Bank Locality

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Access Pattern Restructuring for Memory Energy

IEEE Transactions on Parallel and Distributed Systems
Linear data distribution based on index analysis

High performance scientific and engineering computing
Reducing instruction cache energy consumption using a compiler-based strategy

ACM Transactions on Architecture and Code Optimization (TACO)
Automatic loop interchange

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Storage requirement estimation for optimized design of data intensive applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
LODS: locality-oriented dynamic scheduling for on-chip multiprocessors

Proceedings of the 41st annual Design Automation Conference
Data compression for improving SPM behavior

Proceedings of the 41st annual Design Automation Conference
A unified framework for nonlinear dependence testing and symbolic analysis

Proceedings of the 18th annual international conference on Supercomputing
Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Array Composition and Decomposition for Optimizing Embedded Applications

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
An innovative low-power high-performance programmable signal processor for digital communications

IBM Journal of Research and Development
Compiler-directed code restructuring for reducing data TLB energy

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Improving Data Locality by Array Contraction

IEEE Transactions on Computers
An extended ANSI C for processors with a multimedia extension

International Journal of Parallel Programming
Runtime Code Parallelization for On-Chip Multiprocessors

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Generalized Data Transformations for Enhancing Cache Behavior

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
General loop fusion technique for nested loops considering timing and code size

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Quasidynamic Layout Optimizations for Improving Data Locality

IEEE Transactions on Parallel and Distributed Systems
Multi-node broadcasting in all-ported 3-D wormhole-routed torus using an aggregation-then-distribution strategy

Journal of Systems Architecture: the EUROMICRO Journal
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing Address Code Generation for Array-Intensive DSP Applications

Proceedings of the international symposium on Code generation and optimization
A Constraint Network Based Approach to Memory Layout Optimization

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Compiler Analysis of Interprocedural Data Communication

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
New Complexity Results on Array Contraction and Related Problems

Journal of VLSI Signal Processing Systems
Exploitation of parallelism to nested loops with dependence cycles

Journal of Systems Architecture: the EUROMICRO Journal
Optimizing Array-Intensive Applications for On-Chip Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Impact of Compiler-based Data-Prefetching Techniques on SPEC OMP Application Performance

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An Application Analysis Framework For Polymorphic Chip Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Software-Directed Disk Power Management for Scientific Applications

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Bandwidth Management with a Reconfigurable Data Cache

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
A two-level scheduling method: an effective parallelizing technique for uniform nested loops on a DSP multiprocessor

Journal of Systems and Software - Special issue: Software engineering education and training
Toward an automatic parallelization of sparse matrix computations

Journal of Parallel and Distributed Computing
An efficient way to filter out data dependences with a sufficiently large distance between memory references

ACM SIGPLAN Notices
A sample-based cache mapping scheme

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compiling for memory emergency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A linear-time algorithm for optimal barrier placement

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A novel approach for partitioning iteration spaces with variable densities

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory multiprocessor support for functional array processing in SAC

Journal of Functional Programming
Data space-oriented tiling for enhancing locality

ACM Transactions on Embedded Computing Systems (TECS)
Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions

Journal of VLSI Signal Processing Systems
Generating cache hints for improved program efficiency

Journal of Systems Architecture: the EUROMICRO Journal
Improving whole-program locality using intra-procedural and inter-procedural transformations

Journal of Parallel and Distributed Computing
Parallel processing

Encyclopedia of Computer Science
An evaluation of code and data optimizations in the context of disk power reduction

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Dataflow analysis for energy-efficient scratch-pad memory management

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A polynomial-time algorithm for memory space reduction

International Journal of Parallel Programming
Automatic array partitioning based on the Smith normal form

International Journal of Parallel Programming
Increasing on-chip memory space utilization for embedded chip multiprocessors through data compression

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Contributions to the GNU compiler collection

IBM Systems Journal
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
TAPE: a transactional application profiling environment

Proceedings of the 19th annual international conference on Supercomputing
Disk layout optimization for reducing energy consumption

Proceedings of the 19th annual international conference on Supercomputing
Obtaining Affine Transformations to Improve Locality of Loop Nests

Programming and Computing Software
Exploiting Vector Parallelism in Software Pipelined Loops

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Hierarchical submission in a Grid environment

MGC '05 Proceedings of the 3rd international workshop on Middleware for grid computing
Generation of sentences with their parses: the case of propagating scattered context grammars

Acta Cybernetica
High-level synthesis using computation-unit integrated memories

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Compiler-Guided data compression for reducing memory consumption of embedded applications

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Look left, look right, look left again: an application of fractal symbolic analysis to linear algebra code restructuring

International Journal of Parallel Programming
Data dependence analysis techniques for increased accuracy and extracted parallelism

International Journal of Parallel Programming - Special issue II: The 17th annual international conference on supercomputing (ICS'03)
On combining iteration space tiling with data space tiling for scratch-pad memory systems

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Multi-platform Auto-vectorization

Proceedings of the International Symposium on Code Generation and Optimization
Improving the energy behavior of block buffering using compiler optimizations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Shared Scratch-Pad Memory Space Management

ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Energy-aware data prefetching for multi-speed disks

Proceedings of the 3rd conference on Computing frontiers
Multi-compilation: capturing interactions among concurrently-executing applications

Proceedings of the 3rd conference on Computing frontiers
Compiler-directed voltage scaling on communication links for reducing power consumption

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Improving scratch-pad memory reliability through compiler-guided data block duplication

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Dynamic scratch-pad memory management for irregular array access patterns

Proceedings of the conference on Design, automation and test in Europe: Proceedings
A compiler for exploiting nested parallelism in OpenMP programs

Parallel Computing - OpenMp
Reducing code size through address register assignment

ACM Transactions on Embedded Computing Systems (TECS)
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining

Journal of Parallel and Distributed Computing
Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality

ACM Transactions on Design Automation of Electronic Systems (TODAES)
An empirical evaluation of chains of recurrences for array dependence testing

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Observability Statement Coverage Based on Dynamic Factored Use-Definition Chains for Functional Verification

Journal of Electronic Testing: Theory and Applications
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
SAC: a functional array language for efficient multi-threaded execution

International Journal of Parallel Programming
Violated dependence analysis

Proceedings of the 20th annual international conference on Supercomputing
Toward efficient flow-sensitive induction variable analysis and dependence testing for loop optimization

Proceedings of the 44th annual Southeast regional conference
On minimizing materializations of array-valued temporaries

ACM Transactions on Programming Languages and Systems (TOPLAS)
A memory model for scientific algorithms on graphics processors

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
FFT program generation for shared memory: SMP and multicore

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Tight analysis of the performance potential of thread speculation using spec CPU 2006

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Voltage Assignment with Guaranteed Probability Satisfying Timing Constraint for Real-time Multiproceesor DSP

Journal of VLSI Signal Processing Systems
Scheduling of Iterative Algorithms with Matrix Operations for Efficient FPGA Design--Implementation of Finite Interval Constant Modulus Algorithm

Journal of VLSI Signal Processing Systems
Efficient control generation for mapping nested loop programs onto processor arrays

Journal of Systems Architecture: the EUROMICRO Journal
Incorporating Intel® MMX$^{\rm TM}$ technology into a Java$^{\rm TM}$ JIT compiler$^{1}$

Scientific Programming
A transparent runtime data distribution engine for OpenMP

Scientific Programming
Case study on algebraic software methodologies for scientific computing

Scientific Programming
Compiler optimization techniques for OpenMP programs

Scientific Programming
NINJA: Java for high performance numerical computing

Scientific Programming
Interprocedural definition-use chains of dynamic pointer-linked data structures

Scientific Programming
Improving locality for ODE solvers by program transformations

Scientific Programming
Reducing off-chip memory access via stream-conscious tiling on multimedia applications

International Journal of Parallel Programming
A Dimension Abstraction Approach to Vectorization in Matlab

Proceedings of the International Symposium on Code Generation and Optimization
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems

Proceedings of the International Symposium on Code Generation and Optimization
Interactive presentation: A process splitting transformation for Kahn process networks

Proceedings of the conference on Design, automation and test in Europe
Memory bank aware dynamic loop scheduling

Proceedings of the conference on Design, automation and test in Europe
A case for source-level transformations in MATLAB

DSL'99 Proceedings of the 2nd conference on Conference on Domain-Specific Languages - Volume 2
Sensitivity analysis for automatic parallelization on multi-cores

Proceedings of the 21st annual international conference on Supercomputing
A memory-conscious code parallelization scheme

Proceedings of the 44th annual Design Automation Conference
Designer-controlled generation of parallel and flexible heterogeneous MPSoC specification

Proceedings of the 44th annual Design Automation Conference
Design and DSP implementation of fixed-point systems

EURASIP Journal on Applied Signal Processing
Efficient implementation of nested-loop multimedia algorithms

EURASIP Journal on Applied Signal Processing
Forma: A framework for safe automatic array reshaping

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler-Directed Energy Optimization for Parallel Disk Based Systems

IEEE Transactions on Parallel and Distributed Systems
Cache-efficient numerical algorithms using graphics hardware

Parallel Computing
Canonical scattered context generators of sentences with their parses

Theoretical Computer Science
Improving the parallelism of iterative methods by aggressive loop fusion

The Journal of Supercomputing
Quantifying ILP by means of graph theory

Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP

Journal of Parallel and Distributed Computing
Optimization of memory system in real-time embedded systems

ICCOMP'07 Proceedings of the 11th WSEAS International Conference on Computers
Foundations for the integration of scheduling techniques into compilers for parallel languages

International Journal of Computational Science and Engineering
A method to derive the cache performance of irregular applications on machines with direct mapped caches

International Journal of Computational Science and Engineering
Improving I/O performance of applications through compiler-directed code restructuring

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Compiling for an indirect vector register architecture

Proceedings of the 5th conference on Computing frontiers
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Generation of heterogeneous distributed architectures for memory-intensive applications through high-level synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A general data dependence analysis for parallelizing compilers

The Journal of Supercomputing
XARK: An extensible framework for automatic recognition of computational kernels

ACM Transactions on Programming Languages and Systems (TOPLAS)
Prefetch throttling and data pinning for improving performance of shared caches

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Composition of Loop Modules in the Structural Blanks Approach to Programming with Recurrences: A Task of Synthesis of Nested Loops

Informatica
Adaptive Loop Tiling for a Multi-cluster CMP

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
A Systematic Approach to Automatically Generate Multiple Semantically Equivalent Program Versions

Ada-Europe '08 Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies
Residual Checking of Safety Properties

SPIN '08 Proceedings of the 15th international workshop on Model Checking Software
Language Extensions in Support of Compiler Parallelization

Languages and Compilers for Parallel Computing
Flow-Sensitive Loop-Variant Variable Classification in Linear Time

Languages and Compilers for Parallel Computing
Control flow optimization in loops using interval analysis

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Formalizing a Framework for Dynamic Slicing of Program Dependence Graphs in Isabelle/HOL

TPHOLs '08 Proceedings of the 21st International Conference on Theorem Proving in Higher Order Logics
Algorithms and tool support for dynamic information flow analysis

Information and Software Technology
Transformations techniques for extracting parallelism in non-uniform nested loops

WSEAS Transactions on Computers
Smashing: Folding Space to Tile through Time

Languages and Compilers for Parallel Computing
Implementation of Sensitivity Analysis for Automatic Parallelization

Languages and Compilers for Parallel Computing
Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor

The Journal of Supercomputing
A compiler-directed data prefetching scheme for chip multiprocessors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler Controlled Speculation for Power Aware ILP Extraction in Dataflow Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
A Prefetching Algorithm for Multi-speed Disks

Transactions on High-Performance Embedded Architectures and Compilers I
Streaming implementation of a sequential decompression algorithm on an FPGA

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Reducing memory requirements of resource-constrained applications

ACM Transactions on Embedded Computing Systems (TECS)
A SIMD optimization framework for retargetable compilers

ACM Transactions on Architecture and Code Optimization (TACO)
Affine and unimodular transformations for non-uniform nested loops

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Optimization of tele-immersion codes

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Design and implementation of a queue compiler

Microprocessors & Microsystems
Parallelization Approaches for Hardware Accelerators --- Loop Unrolling Versus Loop Partitioning

ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Cache-aware partitioning of multi-dimensional iteration spaces

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Synchronization optimizations for efficient execution on multi-cores

Proceedings of the 23rd international conference on Supercomputing
Chunking parallel loops in the presence of synchronization

Proceedings of the 23rd international conference on Supercomputing
On approximating the ideal random access machine by physical machines

Journal of the ACM (JACM)
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
On PDG-based noninterference and its modular proof

Proceedings of the ACM SIGPLAN Fourth Workshop on Programming Languages and Analysis for Security
Refactoring sequential Java code for concurrency via concurrent libraries

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
A case study on compiler optimizations for the Intel® Core™ 2 duo processor

International Journal of Parallel Programming
Analysis of imperative XML programs

Information Systems
Automatic parallelization for graphics processing units

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Efficient Mapping of Multiresolution Image Filtering Algorithms on Graphics Processors

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Parallel programming with object assemblies

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
A directive-based MPI code generator for Linux PC clusters

The Journal of Supercomputing
Into the Loops: Practical Issues in Translation Validation for Optimizing Compilers

Electronic Notes in Theoretical Computer Science (ENTCS)
Implementing the PGI Accelerator model

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Optimal interprocedural program optimization: a new framework and its application

Optimal interprocedural program optimization: a new framework and its application
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)
Loop transformations for reducing data space requirements of resource-constrained applications

SAS'03 Proceedings of the 10th international conference on Static analysis
Address register assignment for reducing code size

CC'03 Proceedings of the 12th international conference on Compiler construction
Integrating high-level optimizations in a production compiler: design and implementation experience

CC'03 Proceedings of the 12th international conference on Compiler construction
Advanced symbolic analysis for compilers: new techniques and algorithms for symbolic program analysis and optimization

Advanced symbolic analysis for compilers: new techniques and algorithms for symbolic program analysis and optimization
Pipelined parallelization in HPF programs on the earth simulator

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Performance evaluation of compiler controlled power saving scheme

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Simple section interchange and properties of non-computable functions

Science of Computer Programming
Computation mapping for multi-level storage cache hierarchies

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Cashing in on hints for better prefetching and caching in PVFS and MPI-IO

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
DMATiler: revisiting loop tiling for direct memory access

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Algorithmic issues in grid computing

Algorithms and theory of computation handbook
Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Tailoring a self-distributing architecture to a cluster computer environment

EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
An automated approach to improve communication-computation overlap in clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatically translating a general purpose C++ image processing library for GPUs

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A general data dependence analysis to nested loop using integer interval theory

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic multi phase scheduling for heterogeneous cluste

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compiler-directed memory management for heterogeneous MPSoCs

Journal of Systems Architecture: the EUROMICRO Journal
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Thread contracts for safe parallelism

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Code transformations for embedded reconfigurable computing architectures

GTTSE'09 Proceedings of the 3rd international summer school conference on Generative and transformational techniques in software engineering III
Compiler-guided leakage optimization for banked scratch-pad memories

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Frameworks for multi-core architectures: a comprehensive evaluation using 2D/3D image registration

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Loop Distribution and Fusion with Timing and Code Size Optimization

Journal of Signal Processing Systems
Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance

International Journal of High Performance Computing Applications
Data layout transformation for stencil computations on short-vector SIMD architectures

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
An automatic parallelization framework for algebraic computation systems

Proceedings of the 36th international symposium on Symbolic and algebraic computation
Efficient stack distance computation for priority replacement policies

Proceedings of the 8th ACM International Conference on Computing Frontiers
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

The Journal of Supercomputing
Adaptive parallel approximate similarity search for responsive multimedia retrieval

Proceedings of the 20th ACM international conference on Information and knowledge management
Compiler control power saving scheme for multi core processors

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Code transformations for one-pass analysis

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Using machine learning to improve automatic vectorization

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Beyond iteration vectors: instancewise relational abstract domains

SAS'06 Proceedings of the 13th international conference on Static Analysis
Secure execution of computations in untrusted hosts

Ada-Europe'06 Proceedings of the 11th Ada-Europe international conference on Reliable Software Technologies
A new carried-dependence self-scheduling algorithm

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I
Controller synthesis for mapping partitioned programs on array architectures

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Aggressive loop fusion for improving locality and parallelism

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
An incremental compilation approach for OpenMP applications

NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Loop distribution and fusion with timing and code size optimization for embedded DSPs

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Induction variable analysis with delayed abstractions

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Cooperative parallelization

Proceedings of the International Conference on Computer-Aided Design
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
On dependence analysis for SIMD enhanced processors

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Generalized index-set splitting

CC'05 Proceedings of the 14th international conference on Compiler Construction
A compiler-based approach to data security

CC'05 Proceedings of the 14th international conference on Compiler Construction
Verification of source code transformations by program equivalence checking

CC'05 Proceedings of the 14th international conference on Compiler Construction
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
An ILP-Based approach to locality optimization

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Performance of OSCAR multigrain parallelizing compiler on SMP servers

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A matrix-type for performance–portability

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
A study of performance scalability by parallelizing loop iterations on multi-core SMPs

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hierarchical parallelism control for multigrain parallel processing

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatic detection of saturation and clipping idioms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
A hybrid strategy based on data distribution and migration for optimizing memory locality

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Effect of optimizations on performance of OpenMP programs

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Automatic FIR filter generation for FPGAs

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Optimizing local memory allocation and assignment through a decoupled approach

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Memory space conscious loop iteration duplication for reliable execution

SAS'05 Proceedings of the 12th international conference on Static Analysis
Impact of array data flow analysis on the design of energy-efficient circuits

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Matrix-Based programming optimization for improving memory hierarchy performance on imagine

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Enhancements to policy distribution for control flow and looping

DSOM'05 Proceedings of the 16th IFIP/IEEE Ambient Networks international conference on Distributed Systems: operations and Management
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction
An approach for semiautomatic locality optimizations using OpenMP

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Whole-function vectorization

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Neighborhood-aware data locality optimization for NoC-based multicores

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

The Journal of Supercomputing
Analysis of pure methods using garbage collection

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Dynamic trace-based analysis of vectorization potential of applications

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Finding, expressing and managing parallelism in programs executed on clusters of workstations

Computer Communications
Optimization techniques for efficient HTA programs

Parallel Computing
Static detection of loop-invariant data structures

ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Efficient backprojection-based synthetic aperture radar computation with many-core processors

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for low-communication 1-D FFT

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimizing chip multiprocessor work distribution using dynamic compilation

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Architecture-based optimization for mapping scientific applications to imagine

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
API compilation for image hardware accelerators

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral parallel code generation for CUDA

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Automatic speculative parallelization of loops using polyhedral dependence analysis

Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
A Transformation Framework for Optimizing Task-Parallel Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
PolyGLoT: a polyhedral loop transformation framework for a graphical dataflow language

CC'13 Proceedings of the 22nd international conference on Compiler Construction
Parallel execution of Java loops on Graphics Processing Units

Science of Computer Programming
When polyhedral transformations meet SIMD code generation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Location-aware cache management for many-core processors with deep cache hierarchy

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hybrid Hexagonal/Classical Tiling for GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Improving polyhedral code generation for high-level synthesis

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Recovering memory access patterns of executable programs

Science of Computer Programming
Optimal eviction policies for stochastic address traces

Theoretical Computer Science
Leveraging GPUs using cooperative loop speculation

ACM Transactions on Architecture and Code Optimization (TACO)
Compiler-directed file layout optimization for hierarchical storage systems

Scientific Programming - Selected Papers from Super Computing 2012
Efficient backprojection-based synthetic aperture radar computation with many-core processors

Scientific Programming - Selected Papers from Super Computing 2012
A framework for low-communication 1-D FFT

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.05

High Performance Compilers for Parallel Computing

Quantified Score

Visualization

Abstract