Algorithmic skeletons: structured management of parallel computation
Algorithmic skeletons: structured management of parallel computation
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
A view of the parallel computing landscape
Communications of the ACM - A View of Parallel Computing
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Auto-tuning performance on multicore computers
Auto-tuning performance on multicore computers
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An idiom-finding tool for increasing productivity of accelerators
Proceedings of the international conference on Supercomputing
Feasibility analysis of ultra high frame rate visual servoing on FPGA and SIMD processor
ACIVS'11 Proceedings of the 13th international conference on Advanced concepts for intelligent vision systems
GPUs and the Future of Parallel Computing
IEEE Micro
A performance analysis framework for identifying potential benefits in GPGPU applications
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Algorithmic species: A classification of affine loop nests for parallel programming
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Multi-core and many-core were already major trends for the past six years and are expected to continue for the next decade. With these trends of parallel computing, it becomes increasingly difficult to decide on which processor to run a given application, mainly because the programming of these processors has become increasingly challenging. In this work, we present a model to predict the performance of a given application on a multi-core or many-core processor. Since programming these processors can be challenging and time consuming, our model does not require source code to be available for the target processor. This is in contrast to existing performance prediction techniques such as mathematical models and simulators, which require code to be available and optimized for the target architecture. To enable performance prediction prior to algorithm implementation, we classify algorithms using an existing algorithm classification. For each class, we create a specific instance of the roofline model, resulting in a new class-specific model. This new model, named the boat hull model, enables performance prediction and processor selection prior to the development of architecture specific code. We demonstrate the boat hull model using GPUs and CPUs as target architectures. We show that performance is accurately predicted for an example real-life application.