The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The hardness of cache conscious data placement
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
On the Stability of Temporal Data Reference Profiles
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Global trees: a framework for linked data structures on distributed memory parallel systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Multiresolution quantum chemistry in multiwavelet bases
ICCS'03 Proceedings of the 2003 international conference on Computational science
Efficient run-time support for global view programming of linked data structures on distributed memory parallel systems
Hi-index | 0.00 |
This paper describes a technique for improving the data reference locality of parallel programs using the Partitioned Global Address Space (PGAS) model of computation. One of the principal challenges in writing PGAS parallel applications is maximizing communication efficiency. This work describes an on-line technique based on run-time data reference profiling to organize fine-grained data elements into locality-aware blocks suitable for coarse-grained communication. This technique is applicable to parallel applications with large, irregular, pointer-based applications. The described system can perform automatic data relayout using the locality-aware mapping with either iterative (timestep) based applications or as a collective data relayout operation. An empirical evaluation of the approach shows that the technique is useful in increasing data reference locality and improves performance by 10-17% on the SPLASH-2 Barnes-Hut tree benchmark.