A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Parallel Simulation of Bevel Gear Cutting Processes with OpenMP Tasks
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
The ideal HPC programming language
Communications of the ACM
Programming Massively Parallel Processors: A Hands-on Approach
Programming Massively Parallel Processors: A Hands-on Approach
Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
CUDA by Example: An Introduction to General-Purpose GPU Programming
CUDA by Example: An Introduction to General-Purpose GPU Programming
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Considering GPGPU for HPC centers: is it worth the effort?
Facing the multicore-challenge
OpenACC: first experiences with real-world applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Assessing the performance of OpenMP programs on the intel xeon phi
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
An investigation of the performance portability of OpenCL
Journal of Parallel and Distributed Computing
Graphics Processing Units and Open Computing Language for parallel computing
Computers and Electrical Engineering
Hi-index | 0.00 |
The desire for general purpose computation on graphics processing units caused the advance of new programming paradigms, e.g. OpenCL C/C++, CUDA C or the PGI Accelerator Model. In this paper, we apply these programming approaches to the software KegelSpan for simulating bevel gear cutting. This engineering application simulates an important manufacturing process in the automotive industry. The results obtained are compared to an OpenMP implementation on various hardware configurations. The discussion covers performance results, but also productivity of code development realized in this effort.