Computing Minimum-Weight Perfect Matchings
INFORMS Journal on Computing
Compiling for instruction cache performance on a multithreaded architecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Lightweight reference affinity analysis
Proceedings of the 19th annual international conference on Supercomputing
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Forma: A framework for safe automatic array reshaping
ACM Transactions on Programming Languages and Systems (TOPLAS)
MPADS: memory-pooling-assisted data splitting
Proceedings of the 7th international symposium on Memory management
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Compiler techniques for reducing data cache miss rate on a multithreaded architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Hi-index | 0.00 |
Array regrouping enhances program spatial locality by interleaving elements of multiple arrays that tend to be accessed closely. Its effectiveness has been systematically studied for sequential programs running on unicore processors, but not for multithreading programs on modern ChipMultiprocessor (CMP) machines. On one hand, the processor-level parallelism on CMP intensifies memory bandwidth pressure, suggesting the potential benefits of array regrouping for CMP computing. On the other hand, CMP architectures exhibit extra complexities-- especially the hierarchical, heterogeneous cache sharing among hyperthreads, cores, and processors--that impose new challenges to array regrouping. In this work, we initiate an exploration to the new opportunities and challenges. We propose cache-sharing-aware reference affinity analysis for identifying data affinity in multithreading applications. The analysis consists of affinity-guided thread scheduling and hierarchical reference-vector merging, handles cache sharing among both hyperthreads and cores, and offers hints for array regrouping and the avoidance of false sharing. Preliminary experiments demonstrate the potential of the techniques in improving locality of multithreading applications on CMP with various pitfalls avoided.