Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Theory, techniques, and experiments in solving recurrences in computer programs
Theory, techniques, and experiments in solving recurrences in computer programs
IEEE Transactions on Parallel and Distributed Systems
Parallel Programming with Polaris
Computer
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Improving Compiler and Run-Time Support for Adaptive Irregular Codes
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Improving parallel irregular reductions using partial array expansion
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Processing large-scale multi-dimensional data in parallel and distributed environments
Parallel Computing - Parallel data-intensive algorithms and applications
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Comparison of Parallelization Techniques for Irregular Reductions
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
SmartApps: An Application Centric Approach to High Performance Computing
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Irregular Assignment Computations on cc-NUMA Multiprocessors
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Towards Detection of Coarse-Grain Loop-Level Parallelism in Irregular Computations
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Comparison of Locality Transformations for Irregular Codes
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A GSA-based compiler infrastructure to extract parallelism from complex loops
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Optimization techniques for parallel irregular reductions
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
IEEE Transactions on Knowledge and Data Engineering
Optimizing Reduction Computations In a Distributed Environment
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A framework for adaptive algorithm selection in STAPL
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel techniques in irregular codes: cloth simulation as case of study
Journal of Parallel and Distributed Computing
A methodology for detailed performance modeling of reduction computations on SMP machines
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
SmartApps: middle-ware for adaptive applications on reconfigurable platforms
ACM SIGOPS Operating Systems Review
Exploiting Locality for Irregular Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
An Adaptive Algorithm Selection Framework for Reduction Parallelization
IEEE Transactions on Parallel and Distributed Systems
Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions
International Journal of Computational Science and Engineering
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
Balanced, locality-based parallel irregular reductions
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
An adaptive scheme for dynamic parallelization
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Speculative parallelization of partial reduction variables
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
On improving the performance of data partitioning oriented parallel irregular reductions
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Proceedings of the international conference on Supercomputing
Adaptive parallel approximate similarity search for responsive multimedia retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
An inspector-executor algorithm for irregular assignment parallelization
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Parallel reductions: an application of adaptive algorithm selection
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
In this paper, we propose to adapt parallelizing transformations, more specifically, reduction parallelizations, to the actual reference pattern executed by a loop, i.e., to the particular input data and dynamic phase of a program. More precisely we will show how, after validating a reduction at run-time (when this is not possible at compile time) we can dynamically characterize its reference pattern and choose the most appropriate method for parallelizing it. For this purpose, we develop a library of parallel reduction algorithms, including both previously known and novel schemes, which includes algorithms specialized for different classes of access behavior. In particular, each algorithm in our library has identified strengths related to specific reference pattern characteristics, which are matched, at run-time, with measured characteristics of the actual reference pattern. The matching of algorithm to reference pattern is performed using a decision-tree based selection scheme. The contribution of this work consists in new optimizations for reduction parallelization and in the introduction of a new approach to the optimization of irregular applications: Characteristic based customization.