Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
The design and analysis of spatial data structures
The design and analysis of spatial data structures
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Managing pages in shared virtual memory systems: getting the compiler into the game
ICS '93 Proceedings of the 7th international conference on Supercomputing
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Application scaling under shared virtual memory on a cluster of SMPs
ICS '99 Proceedings of the 13th international conference on Supercomputing
Improving memory hierarchy performance for irregular applications
ICS '99 Proceedings of the 13th international conference on Supercomputing
Architectural Implications of a Family of Irregular Applications
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
Scaling irregular parallel codes with minimal programming effort
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
International Journal of Parallel Programming
Reducing Communication Cost for Parallelizing Irregular Scientific Codes
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
ARS: an adaptive runtime system for locality optimization
Future Generation Computer Systems - Tools for program development and analysis
SFCGen: A framework for efficient generation of multi-dimensional space-filling curves by recursion
ACM Transactions on Mathematical Software (TOMS)
CycleMeter: detecting fraudulent peers in internet cycle sharing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Memory-access optimization of parallel molecular dynamics simulation via dynamic data reordering
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Insights for exascale IO APIs from building a petascale IO API
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
11 PFLOP/s simulations of cloud cavitation collapse
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
We demonstrate that data reordering can substantially improve the performance of fine-grained irregular shared-memory benchmarks, on both hardware and software shared-memory systems. In particular, we evaluate two distinct data reordering techniques that seek to co-locate in memory objects in close proximity in the physical system modeled by the computation. The effects of these techniques are increased spatial locality and reduced false sharing. We evaluate the effectiveness of the data reordering techniques on a set of five irregular applications from SPLASH-2 and Chaos. We implement both techniques in a small library, allowing us to enable them in an application by adding less than 10 lines of code. Our results on one hardware and two software shared-memory systems show that, with data reordering during initialization, the performance of these applications is improved by 12 percent to 99 percent on the Origin 2000, 30 percent to 366 percent on TreadMarks, and 14 percent to 269 percent on HLRC.