Communications of the ACM
A study of branch prediction strategies
25 years of the international symposia on Computer architecture (selected papers)
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Document for a Standard Message-Passing Interface
Document for a Standard Message-Passing Interface
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Amdahl's Law in the Multicore Era
Computer
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Hi-index | 0.00 |
For the sake of software compatibility, simulations are often parallelized without much code rewriting. Performances can be further improved by optimizing codes so that to use the maximum power offered by parallel architectures. While this approach can provide some speed-up, performance of parallelized codes can be strongly limited a priori because traditional algorithms have been designed for sequential technologies. Thus, additional increase of performance should ultimately rely on some redesign of algorithms. Here, we redesign an algorithm that has traditionally been used to simulate the folding properties of polymers. We address the issue of performance in the context of biological applications, more particularly in the active field of chromosome modelling. Due to the strong confinement of chromosomes in the cells, simulation of their motion is slowed down by the laborious search for the next valid states to progress. Our redesign, that we call the Possible Futures Algorithm (PFA), relies on the parallel computation of possible evolutions of the same state, which effectively increases the probability to obtain a valid state at each step. We apply PFA on a GPU-based architecture, allowing us to optimally reduce the latency induced by the computation overhead of possible futures. We show that compared to the initial sequential model the acceptance rate of new states significantly increases without impacting the execution time. In particular, the stronger the confinement of the chromosome, the more efficient PFA becomes, making our approach appealing for biological applications. While most of our results were obtained using Fermi architecture GPUs from NVIDIA, we highlight improved performance on the cutting-edge Kepler architecture K20 GPUs.