Proceedings of the 1989 ACM/IEEE conference on Supercomputing
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply
ACM Transactions on Programming Languages and Systems (TOPLAS)
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A compiler analysis of cache interference and its applications to compiler optimizations
A compiler analysis of cache interference and its applications to compiler optimizations
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Optimal partitioning and balanced scheduling with the maximal overlap of data footprints
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching
Journal of VLSI Signal Processing Systems
Scheduling and partitioning for multiple loop nests
Proceedings of the 14th international symposium on Systems synthesis
Combined partitioning and data padding for scheduling multiple loop nests
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Embedded processor design challenges
A Quantitative Analysis of Tile Size Selection Algorithms
The Journal of Supercomputing
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Empirical optimization for a sparse linear solver: a case study
International Journal of Parallel Programming - Special issue: The next generation software program
Partitioning and scheduling DSP applications with maximal memory access hiding
EURASIP Journal on Applied Signal Processing
Fast indexing for blocked array layouts to reduce cache misses
International Journal of High Performance Computing and Networking
Dynamic tiling for effective use of shared caches on multithreaded processors
International Journal of High Performance Computing and Networking
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Simultaneous minimization of capacity and conflict misses
Journal of Computer Science and Technology
Automatic creation of tile size selection models
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
On the interaction of tiling and automatic parallelization
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Tuning blocked array layouts to exploit memory hierarchy in SMT architectures
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
Benefits of using parallelized non-progressive network coding
Journal of Network and Computer Applications
Automatic OpenCL work-group size selection for multicore CPUs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Adaptive Mapping and Parameter Selection Scheme to Improve Automatic Code Generation for GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |