Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers
ICS '95 Proceedings of the 9th international conference on Supercomputing
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
A tile selection algorithm for data locality and cache interference
ICS '99 Proceedings of the 13th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
A preprocessing step for global loop transformations for data transfer optimization
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Register tiling in nonrectangular iteration spaces
ACM Transactions on Programming Languages and Systems (TOPLAS)
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets
The Journal of Supercomputing
Compilation of Vector Statements of C[] Language for Architectures with Multilevel Memory Hierarchy
Programming and Computing Software
Enabling unimodular transformations
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Eliminating Stale Data References through Array Data-Flow Analysis
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
An Integrated Framework for Compiler-Directed Cache Coherence and Data Prefetching
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Tiling and Memory Reuse for Sequences of Nested Loops
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Transformations on Doubly Nested Loops
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Locality Enhancement for Large-Scale Shared-Memory Multiprocessors
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
QR factorization for shared memory and message passing
Parallel Computing
Partitioning Loops with Variable Dependence Distances
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs
ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
Interprocedural parallelization analysis in SUIF
ACM Transactions on Programming Languages and Systems (TOPLAS)
A polynomial-time algorithm for memory space reduction
International Journal of Parallel Programming
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
In search of a program generator to implement generic transformations for high-performance computing
Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Implicit and explicit optimizations for stencil computations
Proceedings of the 2006 workshop on Memory system performance and correctness
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Proceedings of the International Symposium on Code Generation and Optimization
Buffer and Register Allocation for Memory Space Optimization
Journal of VLSI Signal Processing Systems
Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Extracting synchronization-free slices of operations in perfectly-nested loops
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Finding synchronization-free parallelism for non-uniform loops
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Finding coarse grained parallelism in computational geometry algorithms
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartIII
Strength reduction of integer division and modulo operations
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Adaptive prefetching for shared cache based chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Runtime biased pointer reuse analysis and its application to energy efficiency
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
A software approach for combating asymmetries of non-volatile memories
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
C1C: A configurable, compiler-guided STT-RAM L1 cache
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |