A fast algorithm for particle simulations
Journal of Computational Physics
Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
An implementation of the fast multipole method without multipoles
SIAM Journal on Scientific and Statistical Computing
The high performance Fortran handbook
The high performance Fortran handbook
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Implementing O(N) N-body algorithms efficiently in data-parallel languages
Scientific Programming
High performance Fortran for highly irregular problems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler support for machine-independent parallelization of irregular problems
Compiler support for machine-independent parallelization of irregular problems
High Performance Fortran: Language Specification (PART II)
ACM SIGPLAN Fortran Forum - Special issue: high performance Fortran language specification, part 2
A Data Parallel Formulation of the Barnes-Hut Method for N -Body Simulations
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
User-controllable coherence for high performance shared memory multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
The efficiency of HPF with respect to irregular applications is still largely unproven. While recent work has shown that a highly irregular hierarchical n-body force calculation method can be implemented in HPF, we have found that the implmentation contains inefficiencies which cause it to run up to a factor of three times slower than our hand-coded, explicitly parallel implementation. Our work examines these inefficiencies, determines that most of the extra overhead is due to a single aspect of the communication strategy, and demonstrates that fixing the communication strategy can bring the overheads of the HPF application to within 25% of those of the hand-coded version.