Distributed execution of functional programs using serial combinators
IEEE Transactions on Computers
An empirical study of automatic restructuring of nonnumerical programs for parallel processors
IEEE Transactions on Computers
Strictness analysis—a practical approach
Proc. of a conference on Functional programming languages and computer architecture
Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Multiplication by Integer constants
Software—Practice & Experience
Interprocedural constant propagation
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Interprocedural dependence analysis and parallelization
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Direct parallelization of call statements
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
ORBIT: an optimizing compiler for scheme
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Revised report on the algorithmic language scheme
ACM SIGPLAN Notices
Annual review of computer science vol. 1, 1986
Structure and interpretation of computer programs
Structure and interpretation of computer programs
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
Superoptimizer: a look at the smallest program
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
Analysis of interprocedural side effects in a parallel programming environment
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Loop quantization: a generalized loop unwinding technique
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
An overview for the PTRAN analysis system for multiprocessing
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Minimizing register usage penalty at procedure calls
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Principles of runtime support for parallel processors
ICS '88 Proceedings of the 2nd international conference on Supercomputing
A framework for determining useful parallelism
ICS '88 Proceedings of the 2nd international conference on Supercomputing
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Compiling issues for supercomputers
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Advanced loop optimizations for parallel computers
Proceedings of the 1st International Conference on Supercomputing
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Fast interprocedual alias analysis
POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scans as Primitive Parallel Operations
IEEE Transactions on Computers
The parascope editor: an interactive parallel programming tool
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A comparison study of automatically vectorizing Fortran compilers
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
IEEE Transactions on Computers
The priority-based coloring approach to register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Updating distributed variables in local computations
Concurrency: Practice and Experience
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
How to read floating point numbers accurately
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
How to print floating-point numbers accurately
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Compilation of Haskell array comprehensions for scientific computing
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Constant propagation with conditional branches
ACM Transactions on Programming Languages and Systems (TOPLAS)
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Strip mining on SIMD architectures
ICS '91 Proceedings of the 5th international conference on Supercomputing
Uniform techniques for loop optimization
ICS '91 Proceedings of the 5th international conference on Supercomputing
An experiment with inline substitution
Software—Practice & Experience
Loop distribution with arbitrary control flow
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Efficient and exact data dependence analysis
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Procedure merging with instruction caches
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Intelligent program optimization and parallelization for parallel computers
Intelligent program optimization and parallelization for parallel computers
Interprocedural transformations for parallel code generation
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling programs for nonshared memory machines
Compiling programs for nonshared memory machines
Compiling with continuations
Alpha architecture reference manual
Alpha architecture reference manual
Unexpected side effects of inline substitution: a case study
ACM Letters on Programming Languages and Systems (LOPLAS)
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
A practical algorithm for exact array dependence analysis
Communications of the ACM
New CPU benchmark suites from SPEC
COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Relaxing SIMD control flow constraints using loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A dynamic scheduling method for irregular parallel programs
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Eliminating branches using a superoptimizer and the GNU C compiler
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Automatic data mapping for distributed-memory parallel computers
ICS '92 Proceedings of the 6th international conference on Supercomputing
Array privatization for parallel execution of loops
ICS '92 Proceedings of the 6th international conference on Supercomputing
IBM Journal of Research and Development
Interprocedural modification side effect analysis with pointer aliasing
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Orchestrating interactions among parallel computations
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Cray Y-MP C90: system features and early benchmark results
Parallel Computing
Array-data flow analysis and its use in array privatization
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic array alignment in data-parallel programs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Static and dynamic evaluation of data dependence analysis
ICS '93 Proceedings of the 7th international conference on Supercomputing
CMAX: a Fortran translator for the connection machine system
ICS '93 Proceedings of the 7th international conference on Supercomputing
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
Preliminary experiences with the Fortran D compiler
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Automatic data allocation to minimize communication on SIMD machines
The Journal of Supercomputing
An optimizing Fortran D compiler for MIMD distributed-memory machines
An optimizing Fortran D compiler for MIMD distributed-memory machines
Memory-hierarchy management
An Algorithm for Translating Boolean Expressions
Journal of the ACM (JACM)
ALPHA—An Automatic Programming System of High Efficiency
Journal of the ACM (JACM)
A Transformation System for Developing Recursive Programs
Journal of the ACM (JACM)
Program Improvement by Source-to-Source Transformation
Journal of the ACM (JACM)
Code Generation for Expressions with Common Subexpressions
Journal of the ACM (JACM)
A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
Experience with the SETL Optimizer
ACM Transactions on Programming Languages and Systems (TOPLAS)
Global optimization by suppression of partial redundancies
Communications of the ACM
Communications of the ACM - Special issue on computer architecture
Assembling code for machines with span-dependent instructions
Communications of the ACM
An analysis of inline substitution for a structured programming language
Communications of the ACM
The parallel execution of DO loops
Communications of the ACM
Parallel processing: a smart compiler and a dumb machine
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A unified approach to global program optimization
POPL '73 Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
Parallel Programming and Compilers
Parallel Programming and Compilers
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A precise inter-procedural data flow algorithm
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient way to find the side effects of procedure calls and the aliases of variables
POPL '79 Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Program Flow Analysis: Theory and Application
Program Flow Analysis: Theory and Application
Structure of Computers and Computations
Structure of Computers and Computations
Architecture of the Pentium Microprocessor
IEEE Micro
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
An Efficient Data Dependence Analysis for Parallelizing Compilers
IEEE Transactions on Parallel and Distributed Systems
An Empirical Study of Fortran Programs for Parallelizing Compilers
IEEE Transactions on Parallel and Distributed Systems
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
The Power Test for Data Dependence
IEEE Transactions on Parallel and Distributed Systems
Loop-Level Parallelism in Numeric and Symbolic Programs
IEEE Transactions on Parallel and Distributed Systems
Perfect Pipelining: A New Loop Parallelization Technique
ESOP '88 Proceedings of the 2nd European Symposium on Programming
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
The Alignment-Distribution Graph
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Predicting the effects of optimization on a procedure body
SIGPLAN '79 Proceedings of the 1979 SIGPLAN symposium on Compiler construction
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
The ILLIAC IV FORTRAN compiler
Proceedings of the conference on Programming languages and compilers for parallel and vector machines
The Paralyzer: Ivtran's Parallelism Analyzer and Synthesizer
Proceedings of the conference on Programming languages and compilers for parallel and vector machines
Fortran for the Texas Instruments ASC system
Proceedings of the conference on Programming languages and compilers for parallel and vector machines
Global common subexpression elimination
Proceedings of a symposium on Compiler optimization
Control and data dependence for program transformations.
Control and data dependence for program transformations.
Improving the performance of virtual memory computers.
Improving the performance of virtual memory computers.
Speedup of ordinary programs
Dependence analysis for subscripted variables and its application to program transformations
Dependence analysis for subscripted variables and its application to program transformations
Compiling for locality of reference
Compiling for locality of reference
Arithmetic shifting considered harmful
ACM SIGPLAN Notices
Programming languages and their compilers: Preliminary notes
Programming languages and their compilers: Preliminary notes
A programming language
Abstract interpretation and low-level code optimization
PEPM '95 Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation
Proceedings of the 28th annual international symposium on Microarchitecture
Efficient and language-independent mobile programs
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Let-floating: moving bindings to give faster programs
Proceedings of the first ACM SIGPLAN international conference on Functional programming
ACM Computing Surveys (CSUR)
Analysis of benchmark characteristics and benchmark performance prediction
ACM Transactions on Computer Systems (TOCS)
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Address calculation for retargetable compilation and exploration of instruction-set architectures
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Low-power mapping of behavioral arrays to multiple memories
ISLPED '96 Proceedings of the 1996 international symposium on Low power electronics and design
A new optimization technique for improving resource exploitation and critical path minization
ISSS '97 Proceedings of the 10th international symposium on System synthesis
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Potential-driven statistical ordering of transformations
DAC '97 Proceedings of the 34th annual Design Automation Conference
Manufacturing cheap, resilient, and stealthy opaque constructs
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
DAC '98 Proceedings of the 35th annual Design Automation Conference
A methodology for guided behavioral-level optimization
DAC '98 Proceedings of the 35th annual Design Automation Conference
Loop fusion in high performance Fortran
ICS '98 Proceedings of the 12th international conference on Supercomputing
On the Removal of Anti- and Output-Dependences
International Journal of Parallel Programming
Compact and efficient presentation conversion code
IEEE/ACM Transactions on Networking (TON)
Architecture-level dependence analysis in support of software maintenance
ISAW '98 Proceedings of the third international workshop on Software architecture
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Software watermarking: models and dynamic embeddings
Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Towards automated synthesis of data mining programs
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating APL programs with SAC
Proceedings of the conference on APL '99 : On track to the 21st century: On track to the 21st century
ACM Transactions on Programming Languages and Systems (TOPLAS)
Power optimization using divide-and-conquer techniques for minimization of the number of operations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
APL '98 Proceedings of the APL98 conference on Array processing language
ACM Transactions on Computer Systems (TOCS)
A fuzzy approach to automatic data locality optimization
SAC '96 Proceedings of the 1996 ACM symposium on Applied Computing
Cache-optimal methods for bit-reversals
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Memory characteristics of iterative methods
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Protecting Java code via code obfuscation
Crossroads - Speical issue on robotics
Optimized unrolling of nested loops
Proceedings of the 14th international conference on Supercomputing
Programming languages and systems for prototyping concurrent applications
ACM Computing Surveys (CSUR)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Exploiting non-uniform reuse for cache optimization
Proceedings of the 2001 ACM symposium on Applied computing
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
IEEE Transactions on Parallel and Distributed Systems
Computer aided hand tuning (CAHT): “applying case-based reasoning to performance tuning”
ICS '01 Proceedings of the 15th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Loop fusion for memory space optimization
Proceedings of the 14th international symposium on Systems synthesis
Source code transformation based on software cost analysis
Proceedings of the 14th international symposium on Systems synthesis
Source code optimization and profiling of energy consumption in embedded systems
ISSS '00 Proceedings of the 13th international symposium on System synthesis
An empirical evaluation of high level transformations for embedded processors
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Energy aware compilation for DSPs with SIMD instructions
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Optimized Unrolling of Nested Loops
International Journal of Parallel Programming
Memory Design and Exploration for Low Power, Embedded Systems
Journal of VLSI Signal Processing Systems - Special issue on signal processing systems design and implementation
Relational programs: An architecture for robust real-time safety-critical process-control systems
Annals of Software Engineering
Handling Global Constraints in Compiler Strategy
International Journal of Parallel Programming
Compilation Techniques for Multimedia Processors
International Journal of Parallel Programming
A Vectorizing Compiler for Multimedia Extensions
International Journal of Parallel Programming
NaraView: An Interactive 3D Visualization System for Parallelization of Programs
International Journal of Parallel Programming
Reconfigurable Instruction Set Processors from a Hardware/Software Perspective
IEEE Transactions on Software Engineering
Array recovery and high-level transformations for DSP applications
ACM Transactions on Embedded Computing Systems (TECS)
A finite state machine based format model of software pipelined loops with conditions
Progress in computer research
A Correction Method for Parallel Loop Execution
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Time-Stamping Algorithms for Parallelization of Loops at Run-Time
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A BSP Approach to the Scheduling of Tightly-Nested Loops
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Formal Model of Software Pipelining Loops with Conditions
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Feedback Guided Scheduling of Nested Loops
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
A Technique for Parallel Loop Execution
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Foundations of Cognitive Support: Toward Abstract Patterns of Usefulness
DSV-IS '02 Proceedings of the 9th International Workshop on Interactive Systems. Design, Specification, and Verification
Non-approximability of the Bulk Synchronous Task Scheduling Problem
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Compiler-Directed Reordering of Data by Cyclic Graph Coloring
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Analysis of Multithreaded Programs
SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Reducing Cache Conflicts by a Parametrized Memory Mapping
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Advanced Scalarization of Array Syntax
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Techniques for Effectively Exploiting a Zero Overhead Loop Buffer
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Alias Analysis by Means of a Model Checker
CC '01 Proceedings of the 10th International Conference on Compiler Construction
A Case Study: Effects of WITH-Loop-Folding on the NAS Benchmark MG in SAC
IFL '98 Selected Papers from the 10th International Workshop on 10th International Workshop
Improving Cache Effectiveness through Array Data Layout Manipulation in SAC
IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
A Compilation Scheme for a Hierarchy of Array Types
IFL '02 Selected Papers from the 13th International Workshop on Implementation of Functional Languages
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Compiler optimization-space exploration
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Predicting the impact of optimizations for embedded systems
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Memory disambiguation for general-purpose applications
CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
Template-based program restructuring - initial experience
CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
Reducing Address Bus Transitions for Low Power Memory Mapping
EDTC '96 Proceedings of the 1996 European conference on Design and Test
Using cache optimizing compiler for managing software cache on distributed shared memory system
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Address Code and Arithmetic Optimizations for Embedded Systems
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Anticipatory Optimization in Domain Specific Translation
ICSR '98 Proceedings of the 5th International Conference on Software Reuse
Loop Alignment for Memory Accesses Optimization
Proceedings of the 12th international symposium on System synthesis
Using Graph Models in Retargetable Optimizing Compilers for Microprocessors with VLIW Architectures
Cybernetics and Systems Analysis
Memory Hierarchy Management for Iterative Graph Structures
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Automatic code generation for a convection scheme
Proceedings of the 2003 ACM symposium on Applied computing
Symbolic transfer function-based approaches to certified compilation
Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
What can we gain by unfolding loops?
ACM SIGPLAN Notices
Single Assignment C: efficient support for high-level array operations in a functional setting
Journal of Functional Programming
Proceedings of the 2004 ACM symposium on Applied computing
Data Reuse Analysis Technique for Software-Controlled Memory Hierarchies
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Improving the adaptability of multi-mode systems via program steering
ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
An innovative low-power high-performance programmable signal processor for digital communications
IBM Journal of Research and Development
Compiler based exploration of DSP energy savings by SIMD operations
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
An extended ANSI C for processors with a multimedia extension
International Journal of Parallel Programming
Control Flow Driven Splitting of Loop Nests at the Source Code Level
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Coordinated parallelizing compiler optimizations and high-level synthesis
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A New Architecture for Transformation-Based Generators
IEEE Transactions on Software Engineering
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Encyclopedia of Computer Science
Analyzing data reuse for cache reconfiguration
ACM Transactions on Embedded Computing Systems (TECS)
Data dependence analysis techniques for increased accuracy and extracted parallelism
International Journal of Parallel Programming - Special issue II: The 17th annual international conference on supercomputing (ICS'03)
Compiler transformations for effectively exploiting a zero overhead loop buffer
Software—Practice & Experience
Journal of VLSI Signal Processing Systems
Compiler optimization of embedded applications for an adaptive SoC architecture
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
On minimizing materializations of array-valued temporaries
ACM Transactions on Programming Languages and Systems (TOPLAS)
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A calculus for parallel computations over multidimensional dense arrays
Computer Languages, Systems and Structures
DRDU: A data reuse analysis technique for efficient scratch-pad memory management
ACM Transactions on Design Automation of Electronic Systems (TODAES)
SAC: off-the-shelf support for data-parallelism on multicores
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Adaptive Service Composition in Flexible Processes
IEEE Transactions on Software Engineering
An operation stacking framework for large ensemble computations
Proceedings of the 21st annual international conference on Supercomputing
Incremental hierarchical memory size estimation for steering of loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic graph-based software fingerprinting
ACM Transactions on Programming Languages and Systems (TOPLAS)
Structural function inlining technique for structurally recursive XML queries
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Influence of procedure cloning on WCET prediction
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
SCCP/x: a compilation profile to support testing and verification of optimized code
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Software controlled memory layout reorganization for irregular array access patterns
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
The potential of trace-level parallelism in Java programs
Proceedings of the 5th international symposium on Principles and practice of programming in Java
Quasi-static scheduling for safe futures
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
WCET-driven, code-size critical procedure cloning
SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
AIC'05 Proceedings of the 5th WSEAS International Conference on Applied Informatics and Communications
A parallel dynamic compiler for CIL bytecode
ACM SIGPLAN Notices
Construction of speculative optimization algorithms
Programming and Computing Software
Compiler driven data layout optimization for regular/irregular array access patterns
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation
Scientific Programming
Hiding Software Watermarks in Loop Structures
SAS '08 Proceedings of the 15th international symposium on Static Analysis
Using Padding to Optimize Locality in Scientific Applications
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Explicit Dependence Metadata in an Active Visual Effects Library
Languages and Compilers for Parallel Computing
Program transformation for numerical precision
Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation
Optimizing parallelism for nested loops with iterational and instructional retiming
Journal of Embedded Computing - Selected papers of EUC 2005
A study of potential parallelism among traces in Java programs
Science of Computer Programming
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
The effect of unrolling and inlining for Python bytecode optimizations
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Source level merging of independent programs
Journal of Parallel and Distributed Computing
Placement optimization using data context collected during garbage collection
Proceedings of the 2009 international symposium on Memory management
Alchemist: A Transparent Dependence Distance Profiling Infrastructure
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Iterative development of parallel programs in the ParJava environment
Programming and Computing Software
Future Generation Computer Systems
Automating the generation of composed linear algebra kernels
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Adaptive scratch pad memory management for dynamic behavior of multimedia applications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A directive-based MPI code generator for Linux PC clusters
The Journal of Supercomputing
Energy-Aware Design of Service-Based Applications
ICSOC-ServiceWave '09 Proceedings of the 7th International Joint Conference on Service-Oriented Computing
A highly flexible, parallel virtual machine: design and experience of ILDJIT
Software—Practice & Experience
Implementing the PGI Accelerator model
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
IFL'02 Proceedings of the 14th international conference on Implementation of functional languages
Locality enhancement by array contraction
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Towards a source level compiler: source level modulo scheduling
Program analysis and compilation, theory and practice
Static reuse distances for locality-based optimizations in MATLAB
Proceedings of the 24th ACM International Conference on Supercomputing
Exploiting finite precision information to guide data-flow mapping
Proceedings of the 47th Design Automation Conference
Finding the best compromise in compiling compound loops to Verilog
Journal of Systems Architecture: the EUROMICRO Journal
Improving MPI communication via data type fission
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Computing the correct Increment of Induction Pointers with application to loop unrolling
Journal of Systems Architecture: the EUROMICRO Journal
Techniques and tools for dynamic optimization
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Parallelization of module network structure learning and performance tuning on SMP
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A code motion technique for accelerating general-purpose computation on the GPU
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Transparent runtime parallelization of the R scripting language
Journal of Parallel and Distributed Computing
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
IFL'09 Proceedings of the 21st international conference on Implementation and application of functional languages
DESOLA: An active linear algebra library using delayed evaluation and runtime code generation
Science of Computer Programming
Proceedings of the tenth international conference on Aspect-oriented software development
Tackling cache-line stealing effects using run-time adaptation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Neural, Parallel & Scientific Computations
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Proceedings of the 38th annual international symposium on Computer architecture
An experimental approach to the performance penalty of the use of classes in Fortran 95
Advances in Engineering Software
HAWKEYE: effective discovery of dataflow impediments to parallelization
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Performance evaluation of highly efficient techniques for software implementation of LFSR
Computers and Electrical Engineering
Minimizing data size for efficient data reuse in grid-enabled medical applications
ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
Programmable data dependencies and placements
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Optimizing nested loops with iterational and instructional retiming
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Compiling high-level languages for vector architectures
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Removing impediments to loop fusion through code transformations
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Architecture- and OS-Independent binary-level dynamic test generation
ICICS'09 Proceedings of the 11th international conference on Information and Communications Security
Loop transformations in the ahead-of-time optimization of java bytecode
CC'06 Proceedings of the 15th international conference on Compiler Construction
An approach for semiautomatic locality optimizations using OpenMP
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Mat-core: a decoupled matrix core extension for general-purpose processors
Neural, Parallel & Scientific Computations
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip
ACM Transactions on Embedded Computing Systems (TECS)
Static detection of loop-invariant data structures
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
Data-driven equivalence checking
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Vectorization past dependent branches through speculation
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
The benefits of using variable-length pipelined operations in high-level synthesis
ACM Transactions on Embedded Computing Systems (TECS)
HEAP: A Highly Efficient Adaptive multi-Processor framework
Microprocessors & Microsystems
Hi-index | 0.00 |
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis.This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application.Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimizations that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques.