Nested OpenMP for efficient computation of 3D critical points in multi-block CFD datasets
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Features for image retrieval: an experimental comparison
Information Retrieval
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
SIAM Journal on Scientific Computing
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Simulation of bevel gear cutting with GPGPUs--performance and productivity
Computer Science - Research and Development
Assessing OpenMP tasking implementations on NUMA architectures
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Efficient backprojection-based synthetic aperture radar computation with many-core processors
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimization of geometric multigrid for emerging multi- and manycore processors
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Task-parallel programming on NUMA architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
The Intel Xeon Phi has been introduced as a new type of compute accelerator that is capable of executing native x86 applications. It supports programming models that are well-established in the HPC community, namely MPI and OpenMP, thus removing the necessity to refactor codes for using accelerator-specific programming paradigms. Because of its native x86 support, the Xeon Phi may also be used stand-alone, meaning codes can be executed directly on the device without the need for interaction with a host. In this sense, the Xeon Phi resembles a big SMP on a chip if its 240 logical cores are compared to a common Xeon-based compute node offering up to 32 logical cores. In this work, we compare a Xeon-based two-socket compute node with the Xeon Phi stand-alone in scalability and performance using OpenMP codes. Considering both as individual SMP systems, they come at a very similar price and power envelope, but our results show significant differences in absolute application performance and scalability. We also show in how far common programming idioms for the Xeon multi-core architecture are applicable for the Xeon Phi many-core architecture and which challenges the changing ratio of core count to single core performance poses for the application programmer.