Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Reducing the space requirement of suffix trees
Software—Practice & Experience
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Linear work suffix array construction
Journal of the ACM (JACM)
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Amdahl's Law in the Multicore Era
Computer
Optimizing data intensive GPGPU computations for DNA sequence alignment
Parallel Computing
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
On the limits of GPU acceleration
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Simulation of bevel gear cutting with GPGPUs--performance and productivity
Computer Science - Research and Development
Poster: fast GPU read alignment with burrows wheeler transform based index
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
Proceedings of the 39th Annual International Symposium on Computer Architecture
Inter-warp instruction temporal locality in deep-multithreaded GPUs
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Warp size impact in GPUs: large or small?
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation
Proceedings of the 40th Annual International Symposium on Computer Architecture
The energy case for graph processing on hybrid CPU and GPU systems
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
HARP: Harnessing inactive threads in many-core processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
GPUs offer drastically different performance characteristics compared to traditional multicore architectures. To explore the tradeoffs exposed by this difference, we refactor MUMmer, a widely-used, highly-engineered bioinformatics application which has both CPU- and GPU-based implementations. We synthesize our experience as three high-level guidelines to design efficient GPU-based applications. First, minimizing the communication overheads is as important as optimizing the computation. Second, trading-off higher computational complexity for a more compact in-memory representation is a valuable technique to increase overall performance (by enabling higher parallelism levels and reducing transfer overheads). Finally, ensuring that the chosen solution entails low pre- and post-processing overheads is essential to maximize the overall performance gains. Based on these insights, MUMmerGPU++, our GPU-based design of the MUMmer sequence alignment tool, achieves, on realistic workloads, up to 4x speedup compared to a previous, highly optimized GPU port.