Experiences with UPC on TILE-64 processor

Authors:
Olivier Serres;Ahmad Anbar;Saumil Merchant;Tarek El-Ghazawi
Affiliations:
NSF Center for High-Performance Reconfigurable Computing (CHREC), Dept. of Electrical and Computer Engineering, The George Washington University, 801 22nd St NW, 20052, USA;NSF Center for High-Performance Reconfigurable Computing (CHREC), Dept. of Electrical and Computer Engineering, The George Washington University, 801 22nd St NW, 20052, USA;NSF Center for High-Performance Reconfigurable Computing (CHREC), Dept. of Electrical and Computer Engineering, The George Washington University, 801 22nd St NW, 20052, USA;NSF Center for High-Performance Reconfigurable Computing (CHREC), Dept. of Electrical and Computer Engineering, The George Washington University, 801 22nd St NW, 20052, USA
Venue:
AERO '11 Proceedings of the 2011 IEEE Aerospace Conference
Year:
2011

Citing 0
Cited 1

Exploring cross-layer power management for PGAS applications on the SCC platform

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Partitioned global address space (PGAS) programming model presents programmers with a globally shared address space with locality awareness and one-sided communication constructs. The shared address space and the one-sided communication constructs enhance ease-of-use of PGAS based languages and the locality awareness enables programmers and the runtime systems to achieve higher performance. Thus PGAS programming model may help address the escalating software complexity issues resulting from the proliferation of many-core processor architectures in aerospace and computing systems in general. This paper presents our experiences with Unified parallel C (UPC), a PGAS language, on the Tile64™ processor, a 64-core processor from Tilera Corporation. We ported Berkeley UPC compiler and runtime system on the Tilera architecture and evaluated two separate runtime implementation conduits of the underlying GASNet communication library, a pThreads based conduit and an MPI based conduit. Each conduit uses different on-chip, inter-core communication networks providing different latencies and bandwidths for inter-process communications. The paper presents the implementation details and empirical analyses of both approaches by comparing and evaluating results from NAS Parallel Benchmark suite. The analyses reveal various optimization opportunities based on specific many-core architectural features which are also discussed in the paper12.