Developing a practical projection-based parallel Delaunay algorithm
Proceedings of the twelfth annual symposium on Computational geometry
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
ICS '90 Proceedings of the 4th international conference on Supercomputing
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Parallelization of a dynamic unstructured application using three leading paradigms
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Real-time biomechanical simulation of volumetric brain deformation for image guided neurosurgery
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Multi-processor performance on the Tera MTA
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Guaranteed: quality parallel delaunay refinement for restricted polyhedral domains
Proceedings of the eighteenth annual symposium on Computational geometry
Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator
FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Implicitly-multithreaded processors
Proceedings of the 30th annual international symposium on Computer architecture
Delaunay refinement mesh generation
Delaunay refinement mesh generation
A Load Balancing Framework for Adaptive and Asynchronous Applications
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 18th annual international conference on Supercomputing
SimICS/sun4m: a virtual workstation
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Sparse parallel Delaunay mesh refinement
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Three-dimensional delaunay refinement for multi-core processors
Proceedings of the 22nd annual international conference on Supercomputing
A multigrain Delaunay mesh generation method for multicore SMT-based architectures
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
A template for developing next generation parallel Delaunay refinement methods
Finite Elements in Analysis and Design
An evaluation of OpenMP on current and emerging multithreaded/multicore processors
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
A pattern language for parallelizing irregular algorithms
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Hybrid PGAS runtime support for multicore nodes
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Shared work list: hacking amorphous data parallelism in UPC
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
High quality real-time image-to-mesh conversion for finite element simulations
Proceedings of the 27th international ACM conference on International conference on supercomputing
High quality real-time Image-to-Mesh conversion for finite element simulations
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Given the importance of parallel mesh generation in large-scale scientific applications and the proliferation of multilevel SMT-based architectures, it is imperative to obtain insight on the interaction between meshing algorithms and these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level and fine-grain at the element level. This multigrain data parallel approach targets clusters built from low-end, commercially available SMTs. Our experimental evaluation shows that current SMTs are not capable of executing fine-grain parallelism in PCDM. However, experiments on a simulated SMT indicate that with modest hardware support it is possible to exploit fine-grain parallelism opportunities. The exploitation of fine-grain parallelism results to higher performance than a pure MPI implementation and closes the gap between the performance of PCDM and the state-of-the-art sequential mesher on a single physical processor. Our findings extend to other adaptive and irregular multigrain, parallel algorithms.