KFusion: optimizing data flow without compromising modularity

Authors:
Liam Kiemele;Celina Berg;Aaron Gulliver;Yvonne Coady
Affiliations:
University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, C African Rp;University of Victoria, Victoria, BC, Canada
Venue:
Proceedings of the 12th annual international conference on Aspect-oriented software development
Year:
2013

Citing 13
Cited 0

OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
(R) Polynomial - Time Nested Loop Fusion with Full Parallelism

ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Interprocedural dependence analysis and parallelization

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms

International Journal of High Performance Computing Applications
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic parallelization for graphics processing units

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
A GPGPU compiler for memory optimization and parallelism management

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
A domain-specific approach to heterogeneous parallelism

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems

Proceedings of the 4th International Workshop on Multicore Software Engineering
Automatic performance optimization in ViennaCL for GPUs

Proceedings of the 9th Workshop on Parallel/High-Performance Object-Oriented Scientific Computing
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Programming language support for multi-core architectures introduces a fundamentally new mechanism for modularity---a kernel. Though it can be used as a means to separate concerns, a kernel is given a clean slate of memory at execution time. As a consequence, application developers attempting to leverage libraries of kernels often incur substantial unanticipated performance penalties. Currently, the only recourse is to compromise modularity for the sake of optimizing data flow on an application-specific basis. KFusion is our prototype tool for optimizing libraries of kernels according to application-specific needs. Our goal is to shield application developers from loop fusion and deforestation in compositions of low level kernels that share data. Libraries, augmented by domain experts with annotations to ensure correct compositions of kernels, provide application developers with the opportunity to supply hints according to customized data flow needs---keeping modularity intact. In the worst case, an inaccurate hint incurs no penalty. Case studies of applications using general-purpose libraries for linear algebra, image manipulation and physics engines show that KFusion can substantially improve performance associated memory bandwidth bottlenecks.