Folklore confirmed: reducible flow graphs are exponentially larger
POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reducing Communication Cost for Parallelizing Irregular Scientific Codes
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
TEST: a tracer for extracting speculative threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
Sourcebook of parallel computing
Transforming Complex Loop Nests for Locality
The Journal of Supercomputing
What can we gain by unfolding loops?
ACM SIGPLAN Notices
Single Assignment C: efficient support for high-level array operations in a functional setting
Journal of Functional Programming
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Dynamically Tuned Sorting Library
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Applications of storage mapping optimization to register promotion
Proceedings of the 18th annual international conference on Supercomputing
General loop fusion technique for nested loops considering timing and code size
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
The Energy Impact of Aggressive Loop Fusion
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Superword-Level Parallelism in the Presence of Control Flow
Proceedings of the international symposium on Code generation and optimization
The Challenges of Hardware Synthesis from C-Like Languages
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Automatic measurement of memory hierarchy parameters
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatic blocking of QR and LU factorizations for locality
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Scalarization using loop alignment and loop skewing
The Journal of Supercomputing
Efficient data driven run-time code generation
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Shared memory multiprocessor support for functional array processing in SAC
Journal of Functional Programming
Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs
ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
Architecture-aware classical Taylor shift by 1
Proceedings of the 2005 international symposium on Symbolic and algebraic computation
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion
International Journal of High Performance Computing Applications
Contributions to the GNU compiler collection
IBM Systems Journal
Lightweight reference affinity analysis
Proceedings of the 19th annual international conference on Supercomputing
Think globally, search locally
Proceedings of the 19th annual international conference on Supercomputing
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Vector Parallelism in Software Pipelined Loops
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Translation and Run-Time Validation of Loop Transformations
Formal Methods in System Design
Automatic functional verification of memory oriented global source code transformations
HLDVT '03 Proceedings of the Eighth IEEE International Workshop on High-Level Design Validation and Test Workshop
A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality
Proceedings of the International Symposium on Code Generation and Optimization
Compiler Optimizations to Reduce Security Overhead
Proceedings of the International Symposium on Code Generation and Optimization
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Vector LLVA: a virtual vector instruction set for media processing
Proceedings of the 2nd international conference on Virtual execution environments
Parallelization of the data encryption standard(DES) algorithm
Enhanced methods in computer security, biometric and artificial intelligence systems
Power optimizations for the MLCA using dynamic voltage scaling
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Reuse analysis of indirectly indexed arrays
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Self-adapting numerical software (SANS) effort
IBM Journal of Research and Development
An empirical evaluation of chains of recurrences for array dependence testing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A New Genetic Algorithm for Loop Tiling
The Journal of Supercomputing
In search of a program generator to implement generic transformations for high-performance computing
Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
SAC: a functional array language for efficient multi-threaded execution
International Journal of Parallel Programming
Proceedings of the 20th annual international conference on Supercomputing
Complete inlining of recursive calls: beyond tail-recursion elimination
Proceedings of the 44th annual Southeast regional conference
On minimizing materializations of array-valued temporaries
ACM Transactions on Programming Languages and Systems (TOPLAS)
Using fine grain multithreading for energy efficient computing
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop pipelining for high-throughput stream computation using self-timed rings
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Improving power efficiency with compiler-assisted cache replacement
Journal of Embedded Computing - Cache exploitation in embedded systems
The Journal of Supercomputing
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
Improving locality for ODE solvers by program transformations
Scientific Programming
A unified evaluation framework for coarse grained reconfigurable array architectures
Proceedings of the 4th international conference on Computing frontiers
An experimental comparison of cache-oblivious and cache-conscious programs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Software behavior oriented parallelization
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A Dimension Abstraction Approach to Vectorization in Matlab
Proceedings of the International Symposium on Code Generation and Optimization
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Proceedings of the International Symposium on Code Generation and Optimization
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
VLIW instruction scheduling for minimal power variation
ACM Transactions on Architecture and Code Optimization (TACO)
Electronic Notes in Theoretical Computer Science (ENTCS)
Locality optimization in wireless applications
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Code-size conscious pipelining of imperfectly nested loops
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Optimistic parallelism benefits from data partitioning
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallel-stage decoupled software pipelining
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Spice: speculative parallel iteration chunk execution
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Compiling for vector-thread architectures
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Compiling for an indirect vector register architecture
Proceedings of the 5th conference on Computing frontiers
GPU acceleration of cutoff pair potentials for molecular modeling applications
Proceedings of the 5th conference on Computing frontiers
Optimized mapping for enchancing the operation parallelism in coarse-grained reconfigurable arrays
SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
Design of the Java HotSpot™ client compiler for Java 6
ACM Transactions on Architecture and Code Optimization (TACO)
Automatic SIMD vectorization of chains of recurrences
Proceedings of the 22nd annual international conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Reasoning about inherent parallelism in modern object-oriented languages
ACSC '08 Proceedings of the thirty-first Australasian conference on Computer science - Volume 74
The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
XARK: An extensible framework for automatic recognition of computational kernels
ACM Transactions on Programming Languages and Systems (TOPLAS)
Program optimization carving for GPU computing
Journal of Parallel and Distributed Computing
Exploiting Loop-Level Parallelism for SIMD Arrays Using OpenMP
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Finding Synchronization-Free Parallelism Represented with Trees of Dependent Operations
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
On Validity of Program Transformations in the Java Memory Model
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Languages and Compilers for Parallel Computing
Language Extensions in Support of Compiler Parallelization
Languages and Compilers for Parallel Computing
Exploiting SIMD Parallelism with the CGiS Compiler Framework
Languages and Compilers for Parallel Computing
Flow-Sensitive Loop-Variant Variable Classification in Linear Time
Languages and Compilers for Parallel Computing
Control flow optimization in loops using interval analysis
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications
Journal of Signal Processing Systems
Address Generation Optimization for Embedded High-Performance Processors: A Survey
Journal of Signal Processing Systems
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Redundancy elimination revisited
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Parallelizing scientific code with invasive interactive parallelization: a case study with reuseware
Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
On the implementation of automatic differentiation tools
Higher-Order and Symbolic Computation
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Languages and Compilers for Parallel Computing
On the Scalability of an Automatically Parallelized Irregular Application
Languages and Compilers for Parallel Computing
Scalable Implementation of Efficient Locality Approximation
Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Software Pipelining in Nested Loops with Prolog-Epilog Merging
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Automatic Discovery of Coarse-Grained Parallelism in Media Applications
Transactions on High-Performance Embedded Architectures and Compilers I
Architecture-aware optimization targeting multithreaded stream computing
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Resource aware mapping on coarse grained reconfigurable arrays
Microprocessors & Microsystems
Design and implementation of a queue compiler
Microprocessors & Microsystems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
The Journal of Supercomputing
Chunking parallel loops in the presence of synchronization
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Fast Track: A Software System for Speculative Program Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
On approximating the ideal random access machine by physical machines
Journal of the ACM (JACM)
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimistic parallelism requires abstractions
Communications of the ACM - The Status of the P versus NP Problem
A case for compiler-driven superpage allocation
Proceedings of the 47th Annual Southeast Regional Conference
Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Systematic search within an optimisation space based on Unified Transformation Framework
International Journal of Computational Science and Engineering
Automatic parallelization for graphics processing units
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Inferring Dataflow Properties of User Defined Table Processors
SAS '09 Proceedings of the 16th International Symposium on Static Analysis
Optimal loop parallelization for maximizing iteration-level parallelism
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Future Generation Computer Systems
The habanero multicore software research project
Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications
Extracting synchronization-free slices of operations in perfectly-nested loops
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Compact multi-dimensional kernel extraction for register tiling
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A program auto-parallelizer based on the component technology of optimizing compiler construction
Programming and Computing Software
Parallel loop generation and scheduling
The Journal of Supercomputing
Automatic memory partitioning and scheduling for throughput and power optimization
Proceedings of the 2009 International Conference on Computer-Aided Design
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Speculative parallelization using software multi-threaded transactions
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
On minimizing register usage of linearly scheduled algorithms with uniform dependencies
Computer Languages, Systems and Structures
Allocation and scheduling of Conditional Task Graphs
Artificial Intelligence
IFL'02 Proceedings of the 14th international conference on Implementation of functional languages
Dependence-based code generation for a CELL processor
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Exploiting speculative thread-level parallelism in data compression applications
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
New algorithms for SIMD alignment
CC'07 Proceedings of the 16th international conference on Compiler construction
Loop parallelization in multi-dimensional cartesian space
PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics
Integrating high-level optimizations in a production compiler: design and implementation experience
CC'03 Proceedings of the 12th international conference on Compiler construction
Decoupled software pipelining creates parallelization opportunities
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Exploiting statistical correlations for proactive prediction of program behaviors
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Data dependence analysis for the parallelization of numerical tree codes
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Hierarchical program representation for program element matching
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Auto-parallelisation of sieve C++ programs
Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
Speculative parallelization using state separation and multiple value prediction
Proceedings of the 2010 international symposium on Memory management
Experiences in initiating concurrency software research efforts
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Model-guided empirical tuning of loop fusion
International Journal of High Performance Systems Architecture
Mapping loop nests to multipipelined architecture
Programming and Computing Software
A profile-based tool for finding pipeline parallelism in sequential programs
Parallel Computing
Simple section interchange and properties of non-computable functions
Science of Computer Programming
Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Computation mapping for multi-level storage cache hierarchies
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Reducing task creation and termination overhead in explicitly parallel programs
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An OpenCL framework for heterogeneous multicores with local memory
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallel programming must be deterministic by default
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
An input-centric paradigm for program dynamic optimizations
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Parallel inclusion-based points-to analysis
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Inferring arbitrary distributions for data and computation
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Application of if-conversion to verification and optimization of workflows
Programming and Computing Software
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Compilers, architectures and synthesis for embedded computing: retrospect and prospect
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Exposing tunable parameters in multi-threaded numerical code
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Scalable SMT-based verification of GPU kernel functions
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Extracting both affine and non-linear synchronization-free slices in program loops
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Automatic program parallelization for multicore processors
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Estimating and exploiting potential parallelism by source-level dependence profiling
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Source-to-source optimization of CUDA C for GPU accelerated cardiac cell modeling
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Register allocation with instruction scheduling for VLIW-architectures
Programming and Computing Software
A performance model for fine-grain accesses in UPC
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Automatic memory partitioning and scheduling for throughput and power optimization
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automatic Parallelization in a Binary Rewriter
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Speculative Parallelization on Commodity Clusters
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Efficient Selection of Vector Instructions Using Dynamic Programming
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
GLOpenCL: OpenCL support on hardware- and software-managed cache multicores
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Patterns for cache optimizations on multi-processor machines
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Journal of Computational Physics
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Lowering STM overhead with static analysis
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Loop Distribution and Fusion with Timing and Code Size Optimization
Journal of Signal Processing Systems
Importance of explicit vectorization for CPU and GPU software performance
Journal of Computational Physics
Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance
International Journal of High Performance Computing Applications
SSIP'05 Proceedings of the 5th WSEAS international conference on Signal, speech and image processing
Data layout transformation for stencil computations on short-vector SIMD architectures
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Commutative set: a language extension for implicit parallel programming
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
The tao of parallelism in algorithms
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic parallelization via matrix multiplication
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
ALTER: exploiting breakable dependences for parallelization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets
Proceedings of the international conference on Supercomputing
The Journal of Supercomputing
Efficient stack distance computation for priority replacement policies
Proceedings of the 8th ACM International Conference on Computing Frontiers
Understanding stencil code performance on multicore architectures
Proceedings of the 8th ACM International Conference on Computing Frontiers
A reuse-aware prefetching scheme for scratchpad memory
Proceedings of the 48th Design Automation Conference
Journal of Systems Architecture: the EUROMICRO Journal
Journal of Computational and Applied Mathematics
Symmetry-aware predicate abstraction for shared-variable concurrent programs
CAV'11 Proceedings of the 23rd international conference on Computer aided verification
The Journal of Supercomputing
Localizing globals and statics to make C programs thread-safe
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Probabilistically accurate program transformations
SAS'11 Proceedings of the 18th international conference on Static analysis
Safe parallel programming using dynamic dependence hints
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Enhancing locality for recursive traversals of recursive structures
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Combining measures for temporal and spatial locality
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Applying data copy to improve memory performance of general array computations
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Automatic measurement of instruction cache capacity
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Combined ILP and register tiling: analytical model and optimization framework
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Analytic models and empirical search: a hybrid approach to code optimization
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Parallelization of utility programs based on behavior phase analysis
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
PLDS: Partitioning linked data structures for parallelism
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Using machine learning to improve automatic vectorization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A data locality methodology for matrix---matrix multiplication algorithm
The Journal of Supercomputing
Parallelisation of sequential programs by invasive composition and aspect weaving
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Programmable data dependencies and placements
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Loop distribution and fusion with timing and code size optimization for embedded DSPs
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Induction variable analysis with delayed abstractions
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Combined loop transformation and hierarchy allocation for data reuse optimization
Proceedings of the International Conference on Computer-Aided Design
Massively parallel programming models used as hardware description languages: the OpenCL case
Proceedings of the International Conference on Computer-Aided Design
Limits of parallelism using dynamic dependency graphs
WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Generalized index-set splitting
CC'05 Proceedings of the 14th international conference on Compiler Construction
Verification of source code transformations by program equivalence checking
CC'05 Proceedings of the 14th international conference on Compiler Construction
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Extending the applicability of scalar replacement to multiple induction variables
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A matrix-type for performance–portability
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Towards cache-optimized multigrid using patch-adaptive relaxation
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Automatic detection of saturation and clipping idioms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Efficient SIMD code generation for irregular kernels
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
SIMD defragmenter: efficient ILP realization on data-parallel architectures
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
WSQuery: XQuery for web services integration
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Performance and scalability analysis of cray x1 vectorization and multistreaming optimization
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
TVOC: a translation validator for optimizing compilers
CAV'05 Proceedings of the 17th international conference on Computer Aided Verification
Finding basic block and variable correspondence
SAS'05 Proceedings of the 12th international conference on Static Analysis
An algorithm of automatic workflow optimization
Programming and Computing Software
The polyhedral model is more widely applicable than you think
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Vapor SIMD: Auto-vectorize once, run everywhere
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Neighborhood-aware data locality optimization for NoC-based multicores
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Differential precondition checking: A lightweight, reusable analysis for refactoring tools
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
The HELIX project: overview and directions
Proceedings of the 49th Annual Design Automation Conference
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis
Proceedings of the 49th Annual Design Automation Conference
Scout: a source-to-source transformator for SIMD-Optimizations
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Automatic privatization for parallel execution of loops
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Parcae: a system for flexible parallel execution
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Dynamic trace-based analysis of vectorization potential of applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Logical inference techniques for loop parallelization
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
HELIX: automatic parallelization of irregular programs for chip multiprocessing
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hierarchical overlapped tiling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Program analysis and transformation for holistic optimization of database applications
Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis
Automatic restructuring of GPU kernels for exploiting inter-thread data locality
CC'12 Proceedings of the 21st international conference on Compiler Construction
Fast loop-level data dependence profiling
Proceedings of the 26th ACM international conference on Supercomputing
Performance analysis of Intel multiprocessors using astrophysics simulations
Concurrency and Computation: Practice & Experience
Delta Send-Recv for Dynamic Pipelining in MPI Programs
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Parallelization of the discrete chaotic block encryption algorithm
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Financial software on GPUs: between Haskell and Fortran
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Static detection of loop-invariant data structures
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
From sequential programming to flexible parallel execution
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Automatically enhancing locality for tree traversals with traversal splicing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Improving Data Locality for Efficient In-Core Path Tracing
Computer Graphics Forum
Compiler-in-the-loop exploration during datapath synthesis for higher quality delay-area trade-offs
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Model-based testing of optimizing compilers
TestCom'07/FATES'07 Proceedings of the 19th IFIP TC6/WG6.1 international conference, and 7th international conference on Testing of Software and Communicating Systems
Delayed side-effects ease multi-core programming
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Locality optimized shared-memory implementations of iterated runge-kutta methods
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Concurrency and Computation: Practice & Experience
Layout-oblivious compiler optimization for matrix computations
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Improved loop tiling based on the removal of spurious false dependences
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Exact dependence analysis for increased communication overlap
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Memory partitioning and scheduling co-optimization in behavioral synthesis
Proceedings of the International Conference on Computer-Aided Design
From relational verification to SIMD loop synthesis
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Transformation Framework for Optimizing Task-Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
HOTL: a higher order theory of locality
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Practical automatic loop specialization
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Parallel execution of Java loops on Graphics Processing Units
Science of Computer Programming
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Memory reuse optimizations in the R-Stream compiler
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
When polyhedral transformations meet SIMD code generation
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Pacman: program-assisted cache management
Proceedings of the 2013 international symposium on memory management
Proceedings of the 6th International Systems and Storage Conference
Throughput-oriented kernel porting onto FPGAs
Proceedings of the 50th Annual Design Automation Conference
Runtime dependency analysis for loop pipelining in high-level synthesis
Proceedings of the 50th Annual Design Automation Conference
Fast condensation of the program dependence graph
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
A survey of pipelined workflow scheduling: Models and algorithms
ACM Computing Surveys (CSUR)
A T2 graph-reduction approach to fusion
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Semi-automatic restructuring of offloadable tasks for many-core accelerators
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient compilation of CUDA kernels for high-performance computing on FPGAs
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Automatic OpenCL work-group size selection for multicore CPUs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Multifrontal QR factorization for multicore architectures over runtime systems
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
An automatic thread decomposition approach for pipelined multithreading
International Journal of High Performance Computing and Networking
ASC: automatically scalable computation
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Non-affine Extensions to Polyhedral Code Generation
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Alias control for deterministic parallelism
Aliasing in Object-Oriented Programming
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation
International Journal of Parallel Programming
Recovering memory access patterns of executable programs
Science of Computer Programming
Optimal eviction policies for stochastic address traces
Theoretical Computer Science
High level transforms for SIMD and low-level computer vision algorithms
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Accelerating sequential programs on commodity multi-core processors
Journal of Parallel and Distributed Computing
Hi-index | 0.01 |
Modern computer architectures designed with high-performance microprocessors offer tremendous potential gains in performance over previous designs. Yet their very complexity makes it increasingly difficult to produce efficient code and to realize their full potential. This landmark text from two leaders in the field focuses on the pivotal role that compilers can play in addressing this critical issue. The basis for all the methods presented in this book is data dependence, a fundamental compiler analysis tool for optimizing programs on high-performance microprocessors and parallel architectures. It enables compiler designers to write compilers that automatically transform simple, sequential programs into forms that can exploit special features of these modern architectures. The text provides a broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling. The authors demonstrate the importance and wide applicability of dependence-based compiler optimizations and give the compiler writer the basics needed to understand and implement them. They also offer cookbook explanations for transforming applications by hand to computational scientists and engineers who are driven to obtain the best possible performance of their complex applications.