Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
The implementation and evaluation of fusion and contraction in array languages
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Loop fusion in high performance Fortran
ICS '98 Proceedings of the 12th international conference on Supercomputing
SMARTS: exploiting temporal locality and parallelism through vertical execution
ICS '99 Proceedings of the 13th international conference on Supercomputing
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop optimization for a class of memory-constrained computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Loop fusion for memory space optimization
Proceedings of the 14th international symposium on Systems synthesis
Space-time trade-off optimization for a class of electronic structure calculations
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
International Journal of Parallel Programming
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors
IEEE Transactions on Computers
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Data I/O Minimization for Loops on Limited Onchip Memory Processors
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Framework for Loop Distribution on Limited On-Chip Memory Processors
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Single Assignment C: efficient support for high-level array operations in a functional setting
Journal of Functional Programming
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
The Energy Impact of Aggressive Loop Fusion
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
New Complexity Results on Array Contraction and Related Problems
Journal of VLSI Signal Processing Systems
A case for a working-set-based memory hierarchy
Proceedings of the 2nd conference on Computing frontiers
Fast and efficient searches for effective optimization-phase sequences
ACM Transactions on Architecture and Code Optimization (TACO)
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion
International Journal of High Performance Computing Applications
A polynomial-time algorithm for memory space reduction
International Journal of Parallel Programming
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Profitable loop fusion and tiling using model-driven empirical search
Proceedings of the 20th annual international conference on Supercomputing
On minimizing materializations of array-valued temporaries
ACM Transactions on Programming Languages and Systems (TOPLAS)
Code-size conscious pipelining of imperfectly nested loops
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Improving the parallelism of iterative methods by aggressive loop fusion
The Journal of Supercomputing
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP
Journal of Parallel and Distributed Computing
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Efficient Map Portrayal Using a General-Purpose Query Language
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
New algorithms for SIMD alignment
CC'07 Proceedings of the 16th international conference on Compiler construction
Locality enhancement by array contraction
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Model-guided empirical tuning of loop fusion
International Journal of High Performance Systems Architecture
Memory minimization for tensor contractions using integer linear programming
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
DESOLA: An active linear algebra library using delayed evaluation and runtime code generation
Science of Computer Programming
Parallel memory prediction for fused linear algebra kernels
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Accelerating computationally intensive queries on massive earth science data: (system demonstration)
Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases
A cache-conscious profitability model for empirical tuning of loop fusion
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Efficient search-space pruning for integrated fusion and tiling transformations
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Generalized index-set splitting
CC'05 Proceedings of the 14th international conference on Compiler Construction
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Memory-constrained communication minimization for a class of array computations
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Removing impediments to loop fusion through code transformations
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Iterative collective loop fusion
CC'06 Proceedings of the 15th international conference on Compiler Construction
Optimization techniques for efficient HTA programs
Parallel Computing
Hi-index | 0.01 |