Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis and transformation in the ParaScope editor
ICS '91 Proceedings of the 5th international conference on Supercomputing
Interprocedural transformations for parallel code generation
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Unexpected side effects of inline substitution: a case study
ACM Letters on Programming Languages and Systems (LOPLAS)
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Unroll-and-jam using uniformly generated sets
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Code transformations to improve memory parallelism
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Tools for application-oriented performance tuning
ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop Transformations for Architectures with Partitioned Register Banks
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Register tiling in nonrectangular iteration spaces
ACM Transactions on Programming Languages and Systems (TOPLAS)
HPCVIEW: A Tool for Top-down Analysis of Node Performance
The Journal of Supercomputing
Combining Loop Transformations Considering Caches and Scheduling
International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
Interactive Parallel Programming using the ParaScope Editor
IEEE Transactions on Parallel and Distributed Systems
Time Skewing for Parallel Computers
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Compiler-Directed Dynamic Frequency and Voltage Scheduling
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Optimizing Loop Performance for Clustered VLIW Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
An experimental evaluation of scalar replacement on scientific benchmarks
Software—Practice & Experience
Improving register allocation for subscripted variables
ACM SIGPLAN Notices - Best of PLDI 1979-1999
The Energy Impact of Aggressive Loop Fusion
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Complementing software pipelining with software thread integration
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam
International Journal of High Performance Computing Applications
On the decidability of phase ordering problem in optimizing compilation
Proceedings of the 3rd conference on Computing frontiers
Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
A metric space for computer programs and the principle of computational least action
The Journal of Supercomputing
Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations
International Journal of Parallel, Emergent and Distributed Systems
A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation
Scientific Programming
Computational forces in the Linpack benchmark
Journal of Parallel and Distributed Computing
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Computational forces in the SAGE benchmark
Journal of Parallel and Distributed Computing
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Instruction balance and its relation to program energy consumption
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Dimensional analysis applied to a parallel QR algorithm
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Self-similarity of parallel machines
Parallel Computing
Balance principles for algorithm-architecture co-design
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
On the communication complexity of 3D FFTs and its implications for Exascale
Proceedings of the 26th ACM international conference on Supercomputing
A coldness metric for cache optimization
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
International Journal of High Performance Computing Applications
Software thread integration for instruction-level parallelism
ACM Transactions on Embedded Computing Systems (TECS)
Computer performance analysis and the Pi Theorem
Computer Science - Research and Development
Hi-index | 0.00 |