Exploring the performance of massively multithreaded architectures

Authors:
Shahid Bokhari;Joel Saltz
Affiliations:
Department of Biomedical Informatics, The Ohio State University, Columbus, OH, U.S.A.;Center for Comprehensive Informatics, Emory University, Atlanta, GA, U.S.A.
Venue:
Concurrency and Computation: Practice & Experience
Year:
2010

Citing 0
Cited 1

Parallel solution of the subset-sum problem: an empirical study

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new scheme for evaluating the performance of multithreaded computers and demonstrate its application to the Cray MTA-2 and XMT supercomputers. Our scheme is based on the concept of clock cycles per element, \documentclass{article}\footskip=0pc\pagestyle{empty}\begin{document}${\cal C}$\end{document}, plotted against both problem size and the number of processors. This scheme clearly shows if an implementation has achieved its asymptotic efficiency and is more general than (but includes) the commonly used speedup metric. It permits the discovery of any imperfections in both the software as well as the hardware, and is expected to permit a unified comparison of many different parallel architectures. Measurements on a number of well-known parallel algorithms, ranging from matrix multiply to quicksort, are presented for the MTA-2 and XMT and highlight some interesting differences between these machines. The performance of sequence alignment using dynamic programming is evaluated on the MTA-2, XMT, IBM x3755 and SGI Altix 350 and provides a useful comparison of the capabilities of the Cray machines with more conventional shared memory architectures. Copyright © 2009 John Wiley & Sons, Ltd.