Loop tiling for parallelism

Authors:
Jingling Xue
Affiliations:
Univ. of New South Wales, Sydney, Australia
Venue:
Loop tiling for parallelism
Year:
2000

Citing 0
Cited 55

Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
A Geometric Programming Framework for Optimal Multi-Level Tiling

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
On combining iteration space tiling with data space tiling for scratch-pad memory systems

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Reducing off-chip memory access via stream-conscious tiling on multimedia applications

International Journal of Parallel Programming
Parameterized tiled loops for free

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MPSoC memory optimization using program transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Data cache locking for tight timing calculations

ACM Transactions on Embedded Computing Systems (TECS)
Improving the parallelism of iterative methods by aggressive loop fusion

The Journal of Supercomputing
Dynamic tiling for effective use of shared caches on multithreaded processors

International Journal of High Performance Computing and Networking
Multi-level tiling: M for the price of one

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Optimizing scientific application loops on stream processors

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Cronus: A platform for parallel code generation based on computational geometry methods

Journal of Systems and Software
Positivity, posynomials and tile size selection

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing
Simultaneous minimization of capacity and conflict misses

Journal of Computer Science and Technology
Exploring parallelization strategies for NUFFT data translation

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Compact multi-dimensional kernel extraction for register tiling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Automatic creation of tile size selection models

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Parameterized tiling revisited

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems

Performance Evaluation
Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs

ACM Transactions on Embedded Computing Systems (TECS)
Optimization of FDTD computations in a streaming model architecture

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Architecture exploration for efficient data transfer and storage in data-parallel applications

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Gather/scatter hardware support for accelerating Fast Fourier Transform

Journal of Systems Architecture: the EUROMICRO Journal
Dynamic multi phase scheduling for heterogeneous cluste

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatic generation of fpga-specific pipelined accelerators

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Parallel graduated assignment algorithm for multiple graph matching based on a common labelling

GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition
Model-driven tile size selection for DOACROSS loops on GPUs

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Combined ILP and register tiling: analytical model and optimization framework

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Aggressive loop fusion for improving locality and parallelism

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Mobile pipelines: parallelizing left-looking algorithms using navigational programming

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Efficient tiled loop generation: D-tiling

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Parameterized loop tiling

ACM Transactions on Programming Languages and Systems (TOPLAS)
Matrix-Based programming optimization for improving memory hierarchy performance on imagine

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Streaming model computation of the FDTD problem

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Extendable pattern-oriented optimization directives

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Parallelizing SOR for GPGPUs using alternate loop tiling

Parallel Computing
Analytical bounds for optimal tile size selection

CC'12 Proceedings of the 21st international conference on Compiler Construction
Accelerator-Based implementation of the harris algorithm

ICISP'12 Proceedings of the 5th international conference on Image and Signal Processing
Extendable pattern-oriented optimization directives

ACM Transactions on Architecture and Code Optimization (TACO)
Layout-oblivious optimization for matrix computations

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Towards data tiling for whole programs in scratchpad memory allocation

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Architecture-based optimization for mapping scientific applications to imagine

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
FPGA-specific synthesis of loop-nests with pipelined computational cores

Microprocessors & Microsystems
Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems

Concurrency and Computation: Practice & Experience
Layout-oblivious compiler optimization for matrix computations

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
MultiMaKe: Chip-multiprocessor driven memory-aware kernel pipelining

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA

Proceedings of the Conference on Design, Automation and Test in Europe
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive parallel tiled code generation and accelerated auto-tuning

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Loop tiling for parallelism

Quantified Score

Visualization

Abstract