ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
An evaluation of global address space languages: co-array fortran and unified parallel C
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication Optimizations for Fine-Grained UPC Applications
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A UPC Runtime System Based on MPI and POSIX Threads
PDP '06 Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Enhancements for hyper-threading technology in the operating system: seeking the optimal scheduling
WIESS'02 Proceedings of the 2nd conference on Industrial Experiences with Systems Software - Volume 2
Parallel simulation of Brownian dynamics on shared memory systems with OpenMP and Unified Parallel C
The Journal of Supercomputing
Hi-index | 0.00 |
Since multi-core computers began to dominate the market, enormous efforts have been spent on developing parallel programming languages and/or their compilers to target this architecture. Although Unified Parallel C (UPC), a parallel extension to ANSI C, was originally designed for large scale parallel computers and cluster environments, its partitioned global address space programming model makes it a natural choice for a single multi-core machine, where the main memory is physically shared. This paper builds a case for UPC as a feasible language for multi-core programming by providing an optimizing compiler, called ScaleUPC, which outperforms other UPC compilers targeting SMPs. As the communication cost for remote accesses is removed because all accesses are physically local in a multi-core, we find that the overhead of pointer arithmetic on shared data accesses becomes a prominent bottleneck. The reason is that directly mapping the UPC logical memory layout to physical memory, as used in most of the existing UPC compilers, incurs prohibitive address calculation overhead. This paper presents an alternative memory layout, which effectively eliminates the overhead without sacrificing the UPC memory semantics. Our research also reveals that the compiler for multi-core systems needs to pay special attention to the memory system. We demonstrate how the compiler can enforce static process/thread binding to improve cache performance.