Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Tuning the performance of I/O-intensive parallel applications
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems
IEEE Transactions on Parallel and Distributed Systems
Data-localization for Fortran macro-dataflow computation using partial static task assignment
ICS '96 Proceedings of the 10th international conference on Supercomputing
Eliminating redundant barrier synchronizations in rule-based programs
ICS '96 Proceedings of the 10th international conference on Supercomputing
A new algorithm for partial redundancy elimination based on SSA form
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Compiler and run-time support for semi-structured applications
ICS '97 Proceedings of the 11th international conference on Supercomputing
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Compile-time minimisation of load imbalance in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Proceedings of the fifth workshop on I/O in parallel and distributed systems
Resource sharing in hierarchical synthesis
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
An approach for exploring code improving transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Design space exploration algorithm for heterogeneous multi-processor embedded system design
DAC '98 Proceedings of the 35th annual Design Automation Conference
Register promotion by sparse partial redundancy elimination of loads and stores
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The implementation and evaluation of fusion and contraction in array languages
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Loop fusion in high performance Fortran
ICS '98 Proceedings of the 12th international conference on Supercomputing
A general algorithm for tiling the register level
ICS '98 Proceedings of the 12th international conference on Supercomputing
ICS '98 Proceedings of the 12th international conference on Supercomputing
Dependence driven execution for multiprogrammed multiprocessor
ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory size estimation for multimedia applications
Proceedings of the 6th international workshop on Hardware/software codesign
On the Removal of Anti- and Output-Dependences
International Journal of Parallel Programming
Initial Results for Glacial Variable Analysis
International Journal of Parallel Programming
Using interval arithmetic the calculate data sizes for compilation to multimedia instruction sets
MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
A coordination language for mixed task and and data parallel programs
Proceedings of the 1999 ACM symposium on Applied computing
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Code motion for explicitly parallel programs
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Static single assignment form for machine code
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
High-level semantic optimization of numerical codes
ICS '99 Proceedings of the 13th international conference on Supercomputing
An experimental evaluation of tiling and shackling for memory hierarchy management
ICS '99 Proceedings of the 13th international conference on Supercomputing
An integer linear programming approach for optimizing cache locality
ICS '99 Proceedings of the 13th international conference on Supercomputing
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Accelerating APL programs with SAC
Proceedings of the conference on APL '99 : On track to the 21st century: On track to the 21st century
The Superthreaded Processor Architecture
IEEE Transactions on Computers
Partial redundancy elimination in SSA form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Nonsingular Data Transformations: Definition, Validity, and Applications
International Journal of Parallel Programming
APL '98 Proceedings of the APL98 conference on Array processing language
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time
IEEE Transactions on Computers
A global communication optimization technique based on data-flow analysis and linear algebra
ACM Transactions on Programming Languages and Systems (TOPLAS)
A case for source-level transformations in MATLAB
Proceedings of the 2nd conference on Domain-specific languages
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
Proceedings of the 14th international conference on Supercomputing
ZPL: A Machine Independent Programming Language for Parallel Computers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Influence of compiler optimizations on system power
Proceedings of the 37th Annual Design Automation Conference
A Transformation Approach to Derive Efficient Parallel Implementations
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Timing Analysis for Data and Wrap-Around Fill Caches
Real-Time Systems
Energy-driven integrated hardware-software optimizations using SimplePower
Proceedings of the 27th annual international symposium on Computer architecture
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers
The Journal of Supercomputing
A Loop Transformation Algorithm for Communication Overlapping
International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Memory system energy (poster session): influence of hardware-software optimizations
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Supporting Timing Analysis by Automatic Bounding of LoopIterations
Real-Time Systems - Special issue on worst-case execution-time analysis
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP
IEEE Transactions on Parallel and Distributed Systems
Properties and Algorithms for Unfolding of Probabilistic Data-Flow Graphs
Journal of VLSI Signal Processing Systems
IEEE Transactions on Parallel and Distributed Systems
Minimizing Data and Synchronization Costs in One-Way Communication
IEEE Transactions on Parallel and Distributed Systems
A compiler technique for improving whole-program locality
POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Matching and searching analysis for parallel hardware implementation on FPGAs
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Optimizing memory usage in the polyhedral model
ACM Transactions on Programming Languages and Systems (TOPLAS)
Transformations for imperfectly nested loops
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Tiling imperfectly-nested loop nests
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Compiler-directed selection of dynamic memory layouts
Proceedings of the ninth international symposium on Hardware/software codesign
Exploiting non-uniform reuse for cache optimization
Proceedings of the 2001 ACM symposium on Applied computing
A dynamic locality optimization algorithm for linear algebra codes
Proceedings of the 2001 ACM symposium on Applied computing
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
ICS '01 Proceedings of the 15th international conference on Supercomputing
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop optimization for a class of memory-constrained computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimizing locality for ODE solvers
ICS '01 Proceedings of the 15th international conference on Supercomputing
Global optimization techniques for automatic parallelization of hybrid applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Static Single Assignment Form for Message-Passing Programs
International Journal of Parallel Programming
Reducing memory requirements of nested loops for embedded systems
Proceedings of the 38th annual Design Automation Conference
Computational power of pipelined memory hierarchies
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Proceedings of the 38th annual Design Automation Conference
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Loop parallelization algorithms
Compiler optimizations for scalable parallel systems
Interprocedural analysis based on guarded array regions
Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops
Compiler optimizations for scalable parallel systems
Solving alignment using elementary linear algebra
Compiler optimizations for scalable parallel systems
Compiler support for block buffering
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Communications of the ACM
Morphable Cache Architectures: Potential Benefits
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Source code transformation based on software cost analysis
Proceedings of the 14th international symposium on Systems synthesis
The Efficient Computation of Ownership Sets in HPF
IEEE Transactions on Parallel and Distributed Systems
A novel approach to code analysis of digital signal processing systems
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Influence of compiler optimizations on system power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
IEEE Transactions on Computers
Static and Dynamic Locality Optimizations Using Integer Linear Programming
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Data Relation Vectors: A New Abstraction for Data Optimizations
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Hardware and Software Techniques for Controlling DRAM Power Modes
IEEE Transactions on Computers
On optimal temporal locality of stencil codes
Proceedings of the 2002 ACM symposium on Applied computing
Tera hardware-software cooperation
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Loop re-ordering and pre-fetching at run-time
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Mapping a Single Assignment Programming Language to Reconfigurable Systems
The Journal of Supercomputing
The Journal of Supercomputing
Energy-conscious compilation based on voltage scaling
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
An energy saving strategy based on adaptive loop parallelization
Proceedings of the 39th annual Design Automation Conference
Exploiting shared scratch pad memory space in embedded multiprocessor systems
Proceedings of the 39th annual Design Automation Conference
Exploiting operation level parallelism through dynamically reconfigurable datapaths
Proceedings of the 39th annual Design Automation Conference
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
Proceedings of the 39th annual Design Automation Conference
Optimal organizations for pipelined hierarchical memories
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
International Journal of Parallel Programming
Register tiling in nonrectangular iteration spaces
ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory Design and Exploration for Low Power, Embedded Systems
Journal of VLSI Signal Processing Systems - Special issue on signal processing systems design and implementation
Automated design synthesis and partitioning for adaptive reconfigurable hardware
Hardware implementation of intelligent systems
Optimizing inter-nest data locality
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Increasing temporal locality with skewing and recursive blocking
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets
The Journal of Supercomputing
Precise Data Locality Optimization of Nested Loops
The Journal of Supercomputing
Parallel symbolic computation in ACE
Annals of Mathematics and Artificial Intelligence
Correctness properties in a shared-memory parallel language
Journal of the ACM (JACM)
Improving memory energy using access pattern classification
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors
The Journal of Supercomputing
Processor Array Synthesis from Shift-Variant Deep Nested Do Loops
The Journal of Supercomputing
Unified Interprocedural Parallelism Detection
International Journal of Parallel Programming
Compilation Techniques for Multimedia Processors
International Journal of Parallel Programming
A Vectorizing Compiler for Multimedia Extensions
International Journal of Parallel Programming
Path Analysis and Renaming for Predicated Instruction Scheduling
International Journal of Parallel Programming
Combining Loop Transformations Considering Caches and Scheduling
International Journal of Parallel Programming
Reuse-Driven Tiling for Improving Data Locality
International Journal of Parallel Programming
Data-Centric Transformations for Locality Enhancement
International Journal of Parallel Programming
Automatic Intra-Register Vectorization for the Intel® Architecture
International Journal of Parallel Programming
A Layout-Conscious Iteration Space Transformation Technique
IEEE Transactions on Computers
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors
IEEE Transactions on Computers
A Parallelization Domain Oriented Multilevel Graph Partitioner
IEEE Transactions on Computers
Generation of Injective and Reversible Modular Mappings
IEEE Transactions on Parallel and Distributed Systems
Parallelizing graph construction operations in programs with cyclic graphs
Parallel Computing
Evaluating Integrated Hardware-Software Optimizations Using a Unified Energy Estimation Framework
IEEE Transactions on Computers
Parallel multiplication of a vector by a kronecker product of matrices
Parallel numerical linear algebra
ESOP '02 Proceedings of the 11th European Symposium on Programming Languages and Systems
Rescheduling for Locality in Sparse Matrix Computations
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Correction Method for Parallel Loop Execution
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Automatic generation of injective modular mappings
ICPP '97 Proceedings of the international Conference on Parallel Processing
Improving the Performance of Out-of-Core Computations
ICPP '97 Proceedings of the international Conference on Parallel Processing
Sassy: A Language and Optimizing Compiler for Image Processing on Reconfigurable Computing Systems
ICVS '99 Proceedings of the First International Conference on Computer Vision Systems
Mapping Techniques for Parallel Evaluation of Chains of Recurrences
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Compiler-Directed I/O Optimization
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Compile-Time Partitioning Strategy for Non-Rectangular Loop Nests
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Next Generation System Software for Future High-End Computing Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Pipelining Wavefront Computations: Experiences and Performance
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
I/O Granularity Transformations
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Automatic Analysis of Loops to Exploit Operator Parallelism on Reconfigurable Systems
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
An Analytical Comparison of the I-Test and Omega Test
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Language Support for Pipelining Wavefront Computations
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Compiler Framework for Tiling Imperfectly-Nested Loops
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Designing the Agassiz Compiler for Concurrent Multithreaded Architectures
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Comparative Analysis of Dependence Testing Mechanisms
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Experimental Evaluation of Energy Behavior of Iteration Space Tiling
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Compiler-Directed Dynamic Frequency and Voltage Scheduling
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
A Technique for Parallel Loop Execution
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Software Bubbles: Using Predication to Compensate for Aliasing in Software Pipelines
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Interprocedural Transformations for Extracting Maximum Parallelism
ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Generation of Distributed Loop Control
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Complexity of Multi-dimensional Loop Alignment
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Parameter-Induced Aliasing in Ada
Ada Europe '01 Proceedings of the 6th Ade-Europe International Conference Leuven on Reliable Software Technologies
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Temporary Arrays for Distribution of Loops with Control Dependences
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Data Sequence Locality: A Generalization of Temporal Locality
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Heterogeneous Clustered Processors: Organisation and Design
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
I/O-Conscious Tiling for Disk-Resident Data Sets
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Scheduling Iterative Programs onto LogP-Machine
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Automatic SIMD Parallelization of Embedded Applications Based on Pattern Recognition
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Verification of Basic Block Schedules Using RTL Transformations
CHARME '01 Proceedings of the 11th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Enhancing Compiler Techniques for Memory Energy Optimizations
EMSOFT '02 Proceedings of the Second International Conference on Embedded Software
Towards Energy-Aware Iteration Space Tiling
LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
A Holistic Approach to System Level Energy Optimization
PATMOS '00 Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation
Using Cohort-Scheduling to Enhance Server Performance
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
A Framework for Loop Distribution on Limited On-Chip Memory Processors
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Advanced Scalarization of Array Syntax
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Automatic Removal of Array Memory Leaks in Java
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Software Pipelining of Nested Loops
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Efficient Symbolic Analysis for Optimizing Compilers
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems
CC '02 Proceedings of the 11th International Conference on Compiler Construction
A Case Study: Effects of WITH-Loop-Folding on the NAS Benchmark MG in SAC
IFL '98 Selected Papers from the 10th International Workshop on 10th International Workshop
Improving Locality in Out-of-Core Computations Using Data Layout Transformations
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Static Analysis for Guarded Code
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
High Level Programming Methodologies for Data Intensive Computations
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Optimizing Mutual Exclusion Synchronization in Explicitly Parallel Programs
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
An Efficient Technique of Instruction Scheduling on a Superscalar-Based Mulprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Generation of distributed loop control
Embedded processor design challenges
Algorithms for computing the static single assignment form
Journal of the ACM (JACM)
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
QR factorization for shared memory and message passing
Parallel Computing
Dynamic compilation for energy adaptation
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Locality-conscious process scheduling in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Compiler-directed instruction cache leakage optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 40th annual Design Automation Conference
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
TEST: a tracer for extracting speculative threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing
The Journal of Supercomputing
A compiler approach for reducing data cache energy
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A GSA-based compiler infrastructure to extract parallelism from complex loops
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Compiler optimizations for low power systems
Power aware computing
Address code generation for DSP instruction-set architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cone Based Clustering for List Scheduling Algorithms
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Pipeline Vectorization for Reconfigurable Systems
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Automatic Synthesis of Data Storage and Control Structures for FPGA-Based Computing Engines
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
A transformation method to reduce loop overhead in HPF compiler
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Detection of Implicit Parallelisms in the Task Parallel Language
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A New Transformation Method to Generate Optimized DO Loop from FORALL Construct
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Compiler-Directed Array Interleaving for Reducing Energy in Multi-Bank Memories
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Strategies for Improving Data Locality in Embedded Applications
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Identifying parallelism in programs with cyclic graphs
Journal of Parallel and Distributed Computing
Extracting Parallelism in Nested Loops
COMPSAC '96 Proceedings of the 20th Conference on Computer Software and Applications
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ACM Transactions on Programming Languages and Systems (TOPLAS)
Programming skills for a changing world: back to the basics
Journal of Computing Sciences in Colleges
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A scalable wide-issue clustered VLIW with a reconfigurable interconnect
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Exploiting bank locality in multi-bank memories
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
A Quantitative Analysis of Tile Size Selection Algorithms
The Journal of Supercomputing
Parallel Processing of First Order Linear Recurrence on SMP Machines
The Journal of Supercomputing
What can we gain by unfolding loops?
ACM SIGPLAN Notices
Single Assignment C: efficient support for high-level array operations in a functional setting
Journal of Functional Programming
Impact of Data Transformations on Memory Bank Locality
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Access Pattern Restructuring for Memory Energy
IEEE Transactions on Parallel and Distributed Systems
Linear data distribution based on index analysis
High performance scientific and engineering computing
Reducing instruction cache energy consumption using a compiler-based strategy
ACM Transactions on Architecture and Code Optimization (TACO)
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Storage requirement estimation for optimized design of data intensive applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
LODS: locality-oriented dynamic scheduling for on-chip multiprocessors
Proceedings of the 41st annual Design Automation Conference
Data compression for improving SPM behavior
Proceedings of the 41st annual Design Automation Conference
A unified framework for nonlinear dependence testing and symbolic analysis
Proceedings of the 18th annual international conference on Supercomputing
Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Array Composition and Decomposition for Optimizing Embedded Applications
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
An innovative low-power high-performance programmable signal processor for digital communications
IBM Journal of Research and Development
Compiler-directed code restructuring for reducing data TLB energy
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
An extended ANSI C for processors with a multimedia extension
International Journal of Parallel Programming
Runtime Code Parallelization for On-Chip Multiprocessors
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Generalized Data Transformations for Enhancing Cache Behavior
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
General loop fusion technique for nested loops considering timing and code size
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Quasidynamic Layout Optimizations for Improving Data Locality
IEEE Transactions on Parallel and Distributed Systems
Journal of Systems Architecture: the EUROMICRO Journal
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing Address Code Generation for Array-Intensive DSP Applications
Proceedings of the international symposium on Code generation and optimization
A Constraint Network Based Approach to Memory Layout Optimization
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Compiler Analysis of Interprocedural Data Communication
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
New Complexity Results on Array Contraction and Related Problems
Journal of VLSI Signal Processing Systems
Exploitation of parallelism to nested loops with dependence cycles
Journal of Systems Architecture: the EUROMICRO Journal
Optimizing Array-Intensive Applications for On-Chip Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Impact of Compiler-based Data-Prefetching Techniques on SPEC OMP Application Performance
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An Application Analysis Framework For Polymorphic Chip Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Software-Directed Disk Power Management for Scientific Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Bandwidth Management with a Reconfigurable Data Cache
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Journal of Systems and Software - Special issue: Software engineering education and training
Toward an automatic parallelization of sparse matrix computations
Journal of Parallel and Distributed Computing
A sample-based cache mapping scheme
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compiling for memory emergency
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A linear-time algorithm for optimal barrier placement
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A novel approach for partitioning iteration spaces with variable densities
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory multiprocessor support for functional array processing in SAC
Journal of Functional Programming
Data space-oriented tiling for enhancing locality
ACM Transactions on Embedded Computing Systems (TECS)
Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions
Journal of VLSI Signal Processing Systems
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
Improving whole-program locality using intra-procedural and inter-procedural transformations
Journal of Parallel and Distributed Computing
Encyclopedia of Computer Science
An evaluation of code and data optimizations in the context of disk power reduction
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Dataflow analysis for energy-efficient scratch-pad memory management
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A polynomial-time algorithm for memory space reduction
International Journal of Parallel Programming
Automatic array partitioning based on the Smith normal form
International Journal of Parallel Programming
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Contributions to the GNU compiler collection
IBM Systems Journal
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
TAPE: a transactional application profiling environment
Proceedings of the 19th annual international conference on Supercomputing
Disk layout optimization for reducing energy consumption
Proceedings of the 19th annual international conference on Supercomputing
Obtaining Affine Transformations to Improve Locality of Loop Nests
Programming and Computing Software
Exploiting Vector Parallelism in Software Pipelined Loops
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Hierarchical submission in a Grid environment
MGC '05 Proceedings of the 3rd international workshop on Middleware for grid computing
High-level synthesis using computation-unit integrated memories
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Compiler-Guided data compression for reducing memory consumption of embedded applications
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
International Journal of Parallel Programming
Data dependence analysis techniques for increased accuracy and extracted parallelism
International Journal of Parallel Programming - Special issue II: The 17th annual international conference on supercomputing (ICS'03)
On combining iteration space tiling with data space tiling for scratch-pad memory systems
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Multi-platform Auto-vectorization
Proceedings of the International Symposium on Code Generation and Optimization
Improving the energy behavior of block buffering using compiler optimizations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Shared Scratch-Pad Memory Space Management
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Energy-aware data prefetching for multi-speed disks
Proceedings of the 3rd conference on Computing frontiers
Multi-compilation: capturing interactions among concurrently-executing applications
Proceedings of the 3rd conference on Computing frontiers
Compiler-directed voltage scaling on communication links for reducing power consumption
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Improving scratch-pad memory reliability through compiler-guided data block duplication
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Dynamic scratch-pad memory management for irregular array access patterns
Proceedings of the conference on Design, automation and test in Europe: Proceedings
A compiler for exploiting nested parallelism in OpenMP programs
Parallel Computing - OpenMp
Reducing code size through address register assignment
ACM Transactions on Embedded Computing Systems (TECS)
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining
Journal of Parallel and Distributed Computing
Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality
ACM Transactions on Design Automation of Electronic Systems (TODAES)
An empirical evaluation of chains of recurrences for array dependence testing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Journal of Electronic Testing: Theory and Applications
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
SAC: a functional array language for efficient multi-threaded execution
International Journal of Parallel Programming
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 44th annual Southeast regional conference
On minimizing materializations of array-valued temporaries
ACM Transactions on Programming Languages and Systems (TOPLAS)
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Tight analysis of the performance potential of thread speculation using spec CPU 2006
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems
Efficient control generation for mapping nested loop programs onto processor arrays
Journal of Systems Architecture: the EUROMICRO Journal
Incorporating Intel® MMX$^{\rm TM}$ technology into a Java$^{\rm TM}$ JIT compiler$^{1}$
Scientific Programming
A transparent runtime data distribution engine for OpenMP
Scientific Programming
Case study on algebraic software methodologies for scientific computing
Scientific Programming
Compiler optimization techniques for OpenMP programs
Scientific Programming
NINJA: Java for high performance numerical computing
Scientific Programming
Interprocedural definition-use chains of dynamic pointer-linked data structures
Scientific Programming
Improving locality for ODE solvers by program transformations
Scientific Programming
Reducing off-chip memory access via stream-conscious tiling on multimedia applications
International Journal of Parallel Programming
A Dimension Abstraction Approach to Vectorization in Matlab
Proceedings of the International Symposium on Code Generation and Optimization
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Proceedings of the International Symposium on Code Generation and Optimization
Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems
Proceedings of the International Symposium on Code Generation and Optimization
Interactive presentation: A process splitting transformation for Kahn process networks
Proceedings of the conference on Design, automation and test in Europe
Memory bank aware dynamic loop scheduling
Proceedings of the conference on Design, automation and test in Europe
A case for source-level transformations in MATLAB
DSL'99 Proceedings of the 2nd conference on Conference on Domain-Specific Languages - Volume 2
Sensitivity analysis for automatic parallelization on multi-cores
Proceedings of the 21st annual international conference on Supercomputing
A memory-conscious code parallelization scheme
Proceedings of the 44th annual Design Automation Conference
Designer-controlled generation of parallel and flexible heterogeneous MPSoC specification
Proceedings of the 44th annual Design Automation Conference
Design and DSP implementation of fixed-point systems
EURASIP Journal on Applied Signal Processing
Efficient implementation of nested-loop multimedia algorithms
EURASIP Journal on Applied Signal Processing
Forma: A framework for safe automatic array reshaping
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler-Directed Energy Optimization for Parallel Disk Based Systems
IEEE Transactions on Parallel and Distributed Systems
Cache-efficient numerical algorithms using graphics hardware
Parallel Computing
Canonical scattered context generators of sentences with their parses
Theoretical Computer Science
Improving the parallelism of iterative methods by aggressive loop fusion
The Journal of Supercomputing
Quantifying ILP by means of graph theory
Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP
Journal of Parallel and Distributed Computing
Optimization of memory system in real-time embedded systems
ICCOMP'07 Proceedings of the 11th WSEAS International Conference on Computers
Foundations for the integration of scheduling techniques into compilers for parallel languages
International Journal of Computational Science and Engineering
International Journal of Computational Science and Engineering
Improving I/O performance of applications through compiler-directed code restructuring
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Compiling for an indirect vector register architecture
Proceedings of the 5th conference on Computing frontiers
Iterative optimization in the polyhedral model: part ii, multidimensional time
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A general data dependence analysis for parallelizing compilers
The Journal of Supercomputing
XARK: An extensible framework for automatic recognition of computational kernels
ACM Transactions on Programming Languages and Systems (TOPLAS)
Prefetch throttling and data pinning for improving performance of shared caches
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Adaptive Loop Tiling for a Multi-cluster CMP
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
A Systematic Approach to Automatically Generate Multiple Semantically Equivalent Program Versions
Ada-Europe '08 Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies
Residual Checking of Safety Properties
SPIN '08 Proceedings of the 15th international workshop on Model Checking Software
Language Extensions in Support of Compiler Parallelization
Languages and Compilers for Parallel Computing
Flow-Sensitive Loop-Variant Variable Classification in Linear Time
Languages and Compilers for Parallel Computing
Control flow optimization in loops using interval analysis
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Formalizing a Framework for Dynamic Slicing of Program Dependence Graphs in Isabelle/HOL
TPHOLs '08 Proceedings of the 21st International Conference on Theorem Proving in Higher Order Logics
Algorithms and tool support for dynamic information flow analysis
Information and Software Technology
Transformations techniques for extracting parallelism in non-uniform nested loops
WSEAS Transactions on Computers
Smashing: Folding Space to Tile through Time
Languages and Compilers for Parallel Computing
Implementation of Sensitivity Analysis for Automatic Parallelization
Languages and Compilers for Parallel Computing
Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor
The Journal of Supercomputing
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler Controlled Speculation for Power Aware ILP Extraction in Dataflow Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
A Prefetching Algorithm for Multi-speed Disks
Transactions on High-Performance Embedded Architectures and Compilers I
Streaming implementation of a sequential decompression algorithm on an FPGA
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Reducing memory requirements of resource-constrained applications
ACM Transactions on Embedded Computing Systems (TECS)
A SIMD optimization framework for retargetable compilers
ACM Transactions on Architecture and Code Optimization (TACO)
Affine and unimodular transformations for non-uniform nested loops
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Optimization of tele-immersion codes
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Design and implementation of a queue compiler
Microprocessors & Microsystems
Parallelization Approaches for Hardware Accelerators --- Loop Unrolling Versus Loop Partitioning
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Cache-aware partitioning of multi-dimensional iteration spaces
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Synchronization optimizations for efficient execution on multi-cores
Proceedings of the 23rd international conference on Supercomputing
Chunking parallel loops in the presence of synchronization
Proceedings of the 23rd international conference on Supercomputing
On approximating the ideal random access machine by physical machines
Journal of the ACM (JACM)
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
On PDG-based noninterference and its modular proof
Proceedings of the ACM SIGPLAN Fourth Workshop on Programming Languages and Analysis for Security
Refactoring sequential Java code for concurrency via concurrent libraries
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
A case study on compiler optimizations for the Intel® Core™ 2 duo processor
International Journal of Parallel Programming
Analysis of imperative XML programs
Information Systems
Automatic parallelization for graphics processing units
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Efficient Mapping of Multiresolution Image Filtering Algorithms on Graphics Processors
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Parallel programming with object assemblies
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
A directive-based MPI code generator for Linux PC clusters
The Journal of Supercomputing
Into the Loops: Practical Issues in Translation Validation for Optimizing Compilers
Electronic Notes in Theoretical Computer Science (ENTCS)
Implementing the PGI Accelerator model
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Optimal interprocedural program optimization: a new framework and its application
Optimal interprocedural program optimization: a new framework and its application
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
Loop transformations for reducing data space requirements of resource-constrained applications
SAS'03 Proceedings of the 10th international conference on Static analysis
Address register assignment for reducing code size
CC'03 Proceedings of the 12th international conference on Compiler construction
Integrating high-level optimizations in a production compiler: design and implementation experience
CC'03 Proceedings of the 12th international conference on Compiler construction
Advanced symbolic analysis for compilers: new techniques and algorithms for symbolic program analysis and optimization
Pipelined parallelization in HPF programs on the earth simulator
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Performance evaluation of compiler controlled power saving scheme
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Simple section interchange and properties of non-computable functions
Science of Computer Programming
Computation mapping for multi-level storage cache hierarchies
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Cashing in on hints for better prefetching and caching in PVFS and MPI-IO
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
DMATiler: revisiting loop tiling for direct memory access
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Algorithmic issues in grid computing
Algorithms and theory of computation handbook
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Tailoring a self-distributing architecture to a cluster computer environment
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
An automated approach to improve communication-computation overlap in clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatically translating a general purpose C++ image processing library for GPUs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A general data dependence analysis to nested loop using integer interval theory
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic multi phase scheduling for heterogeneous cluste
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compiler-directed memory management for heterogeneous MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Thread contracts for safe parallelism
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Code transformations for embedded reconfigurable computing architectures
GTTSE'09 Proceedings of the 3rd international summer school conference on Generative and transformational techniques in software engineering III
Compiler-guided leakage optimization for banked scratch-pad memories
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Frameworks for multi-core architectures: a comprehensive evaluation using 2D/3D image registration
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Loop Distribution and Fusion with Timing and Code Size Optimization
Journal of Signal Processing Systems
Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance
International Journal of High Performance Computing Applications
Data layout transformation for stencil computations on short-vector SIMD architectures
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
An automatic parallelization framework for algebraic computation systems
Proceedings of the 36th international symposium on Symbolic and algebraic computation
Efficient stack distance computation for priority replacement policies
Proceedings of the 8th ACM International Conference on Computing Frontiers
The Journal of Supercomputing
Adaptive parallel approximate similarity search for responsive multimedia retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Compiler control power saving scheme for multi core processors
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Code transformations for one-pass analysis
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Using machine learning to improve automatic vectorization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Beyond iteration vectors: instancewise relational abstract domains
SAS'06 Proceedings of the 13th international conference on Static Analysis
Secure execution of computations in untrusted hosts
Ada-Europe'06 Proceedings of the 11th Ada-Europe international conference on Reliable Software Technologies
A new carried-dependence self-scheduling algorithm
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I
Controller synthesis for mapping partitioned programs on array architectures
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Aggressive loop fusion for improving locality and parallelism
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
An incremental compilation approach for OpenMP applications
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Loop distribution and fusion with timing and code size optimization for embedded DSPs
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Induction variable analysis with delayed abstractions
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Proceedings of the International Conference on Computer-Aided Design
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
On dependence analysis for SIMD enhanced processors
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Generalized index-set splitting
CC'05 Proceedings of the 14th international conference on Compiler Construction
A compiler-based approach to data security
CC'05 Proceedings of the 14th international conference on Compiler Construction
Verification of source code transformations by program equivalence checking
CC'05 Proceedings of the 14th international conference on Compiler Construction
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
An ILP-Based approach to locality optimization
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Performance of OSCAR multigrain parallelizing compiler on SMP servers
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A matrix-type for performance–portability
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
A study of performance scalability by parallelizing loop iterations on multi-core SMPs
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hierarchical parallelism control for multigrain parallel processing
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatic detection of saturation and clipping idioms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
A hybrid strategy based on data distribution and migration for optimizing memory locality
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Effect of optimizations on performance of OpenMP programs
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Automatic FIR filter generation for FPGAs
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Optimizing local memory allocation and assignment through a decoupled approach
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Memory space conscious loop iteration duplication for reliable execution
SAS'05 Proceedings of the 12th international conference on Static Analysis
Impact of array data flow analysis on the design of energy-efficient circuits
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Matrix-Based programming optimization for improving memory hierarchy performance on imagine
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Enhancements to policy distribution for control flow and looping
DSOM'05 Proceedings of the 16th IFIP/IEEE Ambient Networks international conference on Distributed Systems: operations and Management
Polyhedral code generation in the real world
CC'06 Proceedings of the 15th international conference on Compiler Construction
An approach for semiautomatic locality optimizations using OpenMP
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Neighborhood-aware data locality optimization for NoC-based multicores
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Analysis of pure methods using garbage collection
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Dynamic trace-based analysis of vectorization potential of applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Finding, expressing and managing parallelism in programs executed on clusters of workstations
Computer Communications
Optimization techniques for efficient HTA programs
Parallel Computing
Static detection of loop-invariant data structures
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Efficient backprojection-based synthetic aperture radar computation with many-core processors
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for low-communication 1-D FFT
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimizing chip multiprocessor work distribution using dynamic compilation
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Architecture-based optimization for mapping scientific applications to imagine
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
API compilation for image hardware accelerators
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Automatic speculative parallelization of loops using polyhedral dependence analysis
Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
A Transformation Framework for Optimizing Task-Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
PolyGLoT: a polyhedral loop transformation framework for a graphical dataflow language
CC'13 Proceedings of the 22nd international conference on Compiler Construction
Parallel execution of Java loops on Graphics Processing Units
Science of Computer Programming
When polyhedral transformations meet SIMD code generation
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Location-aware cache management for many-core processors with deep cache hierarchy
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hybrid Hexagonal/Classical Tiling for GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Improving polyhedral code generation for high-level synthesis
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Recovering memory access patterns of executable programs
Science of Computer Programming
Optimal eviction policies for stochastic address traces
Theoretical Computer Science
Leveraging GPUs using cooperative loop speculation
ACM Transactions on Architecture and Code Optimization (TACO)
Compiler-directed file layout optimization for hierarchical storage systems
Scientific Programming - Selected Papers from Super Computing 2012
Efficient backprojection-based synthetic aperture radar computation with many-core processors
Scientific Programming - Selected Papers from Super Computing 2012
A framework for low-communication 1-D FFT
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.05 |