Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimisation of component-based applications within a grid environment
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
EASY PIPE: An ``EASY to use'' Parallel Image processing Environment based on algorithmic skelekons
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Loop Parallelization in the Polytope Model
CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
Generative Programming and Active Libraries
Selected Papers from the International Seminar on Generic Programming
Efficient Interprocedural Data Placement Optimisation in a Parallel Library
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Patterns and skeletons for parallel and distributed computing
Patterns and skeletons for parallel and distributed computing
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
A domain-specific interpreter for parallelizing a large mixed-language visualisation application
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Flexible skeletal programming with eskel
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
DESOLA: An active linear algebra library using delayed evaluation and runtime code generation
Science of Computer Programming
Hi-index | 0.00 |
Developers need to be able to write code using high-level, reusable black-box components. Also essential is confidence that code can be mapped to an efficient implementation on the available hardware, with robust high performance. In this paper we present a prototype component library being developed to deliver this for industrial visual effects applications. Components are based on abstract algorithmic skeletons that provide metadata characterizing data accesses and dependence constraints. Metadata is combined at run-time to build a polytope representation which supports aggressive inter-component loop fusion. We present results for a wavelet-transform-based degraining filter running on multicore PC hardware, demonstrating 3.4x---5.3x speed-ups, improved parallel efficiency and a 30% reduction in memory consumption without compromising the program structure.