Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
SMARTS: exploiting temporal locality and parallelism through vertical execution
ICS '99 Proceedings of the 13th international conference on Supercomputing
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
Tuning Compiler Optimizations for Simultaneous Multithreading
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop optimization for a class of memory-constrained computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Loop fusion for memory space optimization
Proceedings of the 14th international symposium on Systems synthesis
Combined partitioning and data padding for scheduling multiple loop nests
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Proceedings of the sixth ACM SIGPLAN international conference on Functional programming
Space-time trade-off optimization for a class of electronic structure calculations
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Loop fusion for clustered VLIW architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
International Journal of Parallel Programming
Quantifying the Multi-Level Nature of Tiling Interactions
International Journal of Parallel Programming
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors
IEEE Transactions on Computers
Data I/O Minimization for Loops on Limited Onchip Memory Processors
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Complexity of Multi-dimensional Loop Alignment
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
A Framework for Loop Distribution on Limited On-Chip Memory Processors
CC '00 Proceedings of the 9th International Conference on Compiler Construction
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Automatic data mapping of signal processing applications
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
General loop fusion technique for nested loops considering timing and code size
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Identifying and Exploiting Spatial Regularity in Data Memory References
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
New Complexity Results on Array Contraction and Related Problems
Journal of VLSI Signal Processing Systems
A case for a working-set-based memory hierarchy
Proceedings of the 2nd conference on Computing frontiers
Formal loop merging for signal transforms
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
A polynomial-time algorithm for memory space reduction
International Journal of Parallel Programming
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
2D data locality: definition, abstraction, and application
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
On minimizing materializations of array-valued temporaries
ACM Transactions on Programming Languages and Systems (TOPLAS)
Expressive power of an algebra for data mining
ACM Transactions on Database Systems (TODS)
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP
Journal of Parallel and Distributed Computing
Reducing memory requirements of resource-constrained applications
ACM Transactions on Embedded Computing Systems (TECS)
New algorithms for SIMD alignment
CC'07 Proceedings of the 16th international conference on Compiler construction
Loop transformations for reducing data space requirements of resource-constrained applications
SAS'03 Proceedings of the 10th international conference on Static analysis
Locality enhancement by array contraction
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Model-guided empirical tuning of loop fusion
International Journal of High Performance Systems Architecture
A model for fusion and code motion in an automatic parallelizing compiler
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Memory minimization for tensor contractions using integer linear programming
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compiler-directed memory management for heterogeneous MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Loop Distribution and Fusion with Timing and Code Size Optimization
Journal of Signal Processing Systems
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Optimizing integrated application performance with cache-aware metascheduling
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
A cache-conscious profitability model for empirical tuning of loop fusion
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Efficient search-space pruning for integrated fusion and tiling transformations
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Loop distribution and fusion with timing and code size optimization for embedded DSPs
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
MiniTasking: improving cache performance for multiple query workloads
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Generalized index-set splitting
CC'05 Proceedings of the 14th international conference on Compiler Construction
Removing impediments to loop fusion through code transformations
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Loop fusion and reordering for register file optimization on stream processors
Journal of Systems and Software
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Revisiting loop fusion in the polyhedral framework
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |