A methodology for parallelizing programs for multicomputers and complex memory multiprocessors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler blockability of numerical algorithms
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
MOB forms: a class of multilevel block algorithms for dense linear algebra operations
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiler blockability of dense matrix factorizations
ACM Transactions on Mathematical Software (TOMS)
A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems
IEEE Transactions on Parallel and Distributed Systems
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
High-level semantic optimization of numerical codes
ICS '99 Proceedings of the 13th international conference on Supercomputing
An experimental evaluation of tiling and shackling for memory hierarchy management
ICS '99 Proceedings of the 13th international conference on Supercomputing
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Automated cache optimizations using CME driven diagnosis
Proceedings of the 14th international conference on Supercomputing
From flop to megaflops: Java for technical computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Tiling imperfectly-nested loop nests
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Optimal tiling for the RNA base pairing problem
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Register tiling in nonrectangular iteration spaces
ACM Transactions on Programming Languages and Systems (TOPLAS)
Reducing Cache Conflicts by Multi-Level Cache Partitioning and Array Elements Mapping
The Journal of Supercomputing
Quantifying the Multi-Level Nature of Tiling Interactions
International Journal of Parallel Programming
Data-Centric Transformations for Locality Enhancement
International Journal of Parallel Programming
An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing
IEEE Transactions on Computers
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors
IEEE Transactions on Computers
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic
IEEE Transactions on Parallel and Distributed Systems
Source Code and Task Graphs in Program Optimization
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
From Flop to MegaFlops: Java for Technical Computing
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Data I/O Minimization for Loops on Limited Onchip Memory Processors
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Framework for Loop Distribution on Limited On-Chip Memory Processors
CC '00 Proceedings of the 9th International Conference on Compiler Construction
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Three-dimensional orthogonal tile sizing problem: mathematical programming approach
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
A Quantitative Analysis of Tile Size Selection Algorithms
The Journal of Supercomputing
Improving register allocation for subscripted variables
ACM SIGPLAN Notices - Best of PLDI 1979-1999
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Code restructuring for improving cache performance of MPSoCs
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
The Journal of Supercomputing
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Costing stepwise refinements of parallel programs
Computer Languages, Systems and Structures
An experimental comparison of cache-oblivious and cache-conscious programs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
A parallel dynamic programming algorithm on a multi-core architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Loop Optimization using Hierarchical Compilation and Kernel Decomposition
Proceedings of the International Symposium on Code Generation and Optimization
Forma: A framework for safe automatic array reshaping
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache-efficient numerical algorithms using graphics hardware
Parallel Computing
Block size selection of parallel LU and QR on PVP-based and RISC-based supercomputers
CHINA HPC '07 Proceedings of the 2007 Asian technology information program's (ATIP's) 3rd workshop on High performance computing in China: solution approaches to impediments for high performance computing
Program optimization carving for GPU computing
Journal of Parallel and Distributed Computing
How to Write Fast Numerical Code: A Small Introduction
Generative and Transformational Techniques in Software Engineering II
Smashing: Folding Space to Tile through Time
Languages and Compilers for Parallel Computing
Compiler-directed scratchpad memory management via graph coloring
ACM Transactions on Architecture and Code Optimization (TACO)
Automating the generation of composed linear algebra kernels
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Parallel loop generation and scheduling
The Journal of Supercomputing
PolyAPM: parallel programming via stepwise refinement with abstract parallel machines
IFL'02 Proceedings of the 14th international conference on Implementation of functional languages
Iterative compilation with kernel exploration
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Custom memory allocation for free
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Loop parallelization in multi-dimensional cartesian space
PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics
Improving data locality by chunking
CC'03 Proceedings of the 12th international conference on Compiler construction
Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs
ACM Transactions on Embedded Computing Systems (TECS)
ULCC: a user-level facility for optimizing shared cache performance on multicores
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Optimizing matrix multiplication with a classifier learning system
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Analytic models and empirical search: a hybrid approach to code optimization
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Forward communication only placements and their use for parallel program construction
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Combining performance aspects of irregular gauss-seidel via sparse tiling
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Efficient tiled loop generation: D-tiling
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hierarchical overlapped tiling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
A simple GPU-accelerated two-dimensional MUSCL-Hancock solver for ideal magnetohydrodynamics
Journal of Computational Physics
Hi-index | 0.01 |