Efficient management of parallelism in object-oriented numerical software libraries
Modern software tools for scientific computing
Flash code: studying astrophysical thermonuclear flashes
Computing in Science and Engineering
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Scalable Line Dynamics in ParaDiS
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Improving the computational intensity of unstructured mesh applications
Proceedings of the 19th annual international conference on Supercomputing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
OpenMP tasking analysis for programmers
CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
On-chip communication and synchronization mechanisms with cache-integrated network interfaces
Proceedings of the 7th ACM international conference on Computing frontiers
Using hybrid parallelism to improve memory use in the Uintah framework
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
What scientific applications can benefit from hardware transactional memory?
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Despite its ease of use, OpenMP has failed to gainwidespread use on large scale systems, largely due to its failure to deliversufficient performance. Our experience indicates that the cost ofinitiating OpenMP regions is simply too high for the desired OpenMPusage scenario of many applications. In this paper, we introduce CLOMP,a new benchmark to characterize this aspect of OpenMP implementationsaccurately. CLOMP complements the existing EPCC benchmarksuite to provide simple, easy to understand measurements of OpenMPoverheads in the context of application usage scenarios. Our results forseveral OpenMP implementations demonstrate that CLOMP identifiesthe amount of work required to compensate for the overheads observedwith EPCC. Further, we show that CLOMP also captures limitations forOpenMP parallelization on NUMA systems.