Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
On combining iteration space tiling with data space tiling for scratch-pad memory systems
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Reducing off-chip memory access via stream-conscious tiling on multimedia applications
International Journal of Parallel Programming
Parameterized tiled loops for free
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Data cache locking for tight timing calculations
ACM Transactions on Embedded Computing Systems (TECS)
Improving the parallelism of iterative methods by aggressive loop fusion
The Journal of Supercomputing
Dynamic tiling for effective use of shared caches on multithreaded processors
International Journal of High Performance Computing and Networking
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Optimizing scientific application loops on stream processors
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Cronus: A platform for parallel code generation based on computational geometry methods
Journal of Systems and Software
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parametric multi-level tiling of imperfectly nested loops
Proceedings of the 23rd international conference on Supercomputing
Simultaneous minimization of capacity and conflict misses
Journal of Computer Science and Technology
Exploring parallelization strategies for NUFFT data translation
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Compact multi-dimensional kernel extraction for register tiling
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Automatic creation of tile size selection models
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Parameterized tiling revisited
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs
ACM Transactions on Embedded Computing Systems (TECS)
Optimization of FDTD computations in a streaming model architecture
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Architecture exploration for efficient data transfer and storage in data-parallel applications
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Gather/scatter hardware support for accelerating Fast Fourier Transform
Journal of Systems Architecture: the EUROMICRO Journal
Dynamic multi phase scheduling for heterogeneous cluste
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatic generation of fpga-specific pipelined accelerators
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Parallel graduated assignment algorithm for multiple graph matching based on a common labelling
GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition
Model-driven tile size selection for DOACROSS loops on GPUs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Combined ILP and register tiling: analytical model and optimization framework
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Aggressive loop fusion for improving locality and parallelism
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Mobile pipelines: parallelizing left-looking algorithms using navigational programming
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Efficient tiled loop generation: D-tiling
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Matrix-Based programming optimization for improving memory hierarchy performance on imagine
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Streaming model computation of the FDTD problem
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Extendable pattern-oriented optimization directives
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Parallelizing SOR for GPGPUs using alternate loop tiling
Parallel Computing
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
Accelerator-Based implementation of the harris algorithm
ICISP'12 Proceedings of the 5th international conference on Image and Signal Processing
Extendable pattern-oriented optimization directives
ACM Transactions on Architecture and Code Optimization (TACO)
Layout-oblivious optimization for matrix computations
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Towards data tiling for whole programs in scratchpad memory allocation
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Architecture-based optimization for mapping scientific applications to imagine
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
FPGA-specific synthesis of loop-nests with pipelined computational cores
Microprocessors & Microsystems
Concurrency and Computation: Practice & Experience
Layout-oblivious compiler optimization for matrix computations
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
MultiMaKe: Chip-multiprocessor driven memory-aware kernel pipelining
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the Conference on Design, Automation and Test in Europe
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive parallel tiled code generation and accelerated auto-tuning
International Journal of High Performance Computing Applications
Hi-index | 0.00 |