Thread reinforcer: Dynamically determining number of threads via OS level monitoring

Authors:
Kishore Kumar Pusukuri;Rajiv Gupta;Laxmi N. Bhuyan
Affiliations:
Department of Computer Science and Engineering, University of California, Riverside, USA 92521;Department of Computer Science and Engineering, University of California, Riverside, USA 92521;Department of Computer Science and Engineering, University of California, Riverside, USA 92521
Venue:
IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Year:
2011

Citing 0
Cited 6

Thread Tranquilizer: Dynamically reducing performance variation

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Performance driven cooperation between kernel and auto-tuning multi-threaded interval b&b applications

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Adaptive parallelism for web search

Proceedings of the 8th ACM European Conference on Computer Systems
Tightfit: adaptive parallelization with foresight

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Network-aware data caching and prefetching for cloud-hosted metadata retrieval

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is often assumed that to maximize the performance of a multithreaded application, the number of threads created should equal the number of cores. While this may be true for systems with four or eight cores, this is not true for systems with larger number of cores. Our experiments with PARSEC programs on a 24-core machine demonstrate this. Therefore, dynamically determining the appropriate number of threads for a multithreaded application is an important unsolved problem. In this paper we develop a simple technique for dynamically determining appropriate number of threads without recompiling the application or using complex compilation techniques or modifying Operating System policies. We first present a scalability study of eight programs from PARSEC conducted on a 24 core Dell PowerEdge R905 server running OpenSolaris.2009.06 for numbers of threads ranging from a few threads to 128 threads. Our study shows that not only does the maximum speedup achieved by these programs vary widely (from 3.6x to 21.9x), the number of threads that produce maximum speedups also vary widely (from 16 to 63 threads). By understanding the overall speedup behavior of these programs we identify the critical Operating System level factors that explain why the speedups vary with the number of threads. As an application of these observations, we develop a framework called "Thread Reinforcer" that dynamically monitors program's execution to search for the number of threads that are likely to yield best speedups. Thread Reinforcer identifies optimal or near optimal number of threads for most of the PARSEC programs studied and as well as for SPEC OMP and PBZIP2 programs.