A global approach to detection of parallelism
A global approach to detection of parallelism
Automatic decomposition of scientific programs for parallel execution
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
A practical algorithm for exact array dependence analysis
Communications of the ACM
IEEE Transactions on Computers
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 14th international conference on Supercomputing
Transforming loops to recursion for multi-level memory hierarchies
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Iteration Space Slicing for Locality
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Reuse Distance Analysis
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Computation regrouping: restructuring programs for temporal data cache locality
ICS '02 Proceedings of the 16th international conference on Supercomputing
Two techniques for reconciling algorithm parallelism with memory constraints
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Data Sequence Locality: A Generalization of Temporal Locality
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Sourcebook of parallel computing
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications
IEEE Transactions on Computers
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Using Scratchpad to Exploit Object Locality in Java
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Profitable loop fusion and tiling using model-driven empirical search
Proceedings of the 20th annual international conference on Supercomputing
Miss Rate Prediction Across Program Inputs and Cache Configurations
IEEE Transactions on Computers
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Instruction balance and its relation to program energy consumption
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Dynamic voltage and frequency scaling for scientific applications
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Model-guided empirical tuning of loop fusion
International Journal of High Performance Systems Architecture
Exposing tunable parameters in multi-threaded numerical code
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Memory Latency Reduction via Thread Throttling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A cache-conscious profitability model for empirical tuning of loop fusion
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Generalized index-set splitting
CC'05 Proceedings of the 14th international conference on Compiler Construction
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Removing impediments to loop fusion through code transformations
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.01 |