Discovering and understanding performance bottlenecks in transactional applications

  • Authors:
  • Ferad Zyulkyarov;Srdjan Stipic;Tim Harris;Osman S. Unsal;Adrián Cristal;Ibrahim Hur;Mateo Valero

  • Affiliations:
  • BSC-Microsoft Research Centre, Universitat Politècnica de Catalunya - BarcelonaTech , Barcelona, Spain;BSC-Microsoft Research Centre, Universitat Politècnica de Catalunya - BarcelonaTech , Barcelona, Spain;Microsoft Research, Cambridge, United Kingdom;BSC-Microsoft Research Centre, Barcelona, Spain;BSC-Microsoft Research Centre, IIIA - Artificial Intelligence Research Institute CSIC - Spanish National Research Council, Barcelona, Spain;BSC-Microsoft Research Centre, Barcelona, Spain;BSC-Microsoft Research Centre, Universitat Politècnica de Catalunya - BarcelonaTech , Barcelona, Spain

  • Venue:
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many researchers have developed applications using transactionalmemory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose tools for profiling and tuning programs which use transactions. In this paper we introduce a series of profiling techniques for TM applications that provide in-depth and comprehensive information about the wasted work caused by aborting transactions. We explore three directions: (i) techniques to identify multiple potential conflicts from a single program run, (ii) techniques to identify the data structures involved in conflicts by using a symbolic path through the heap, rather than a machine address, and (iii) visualization techniques to summarize how threads spend their time and which of their transactions conflict most frequently. To examine the effectiveness of the profiling techniques, we provide a series of illustrations from the STAMP TM benchmark suite and from the synthetic WormBench workload. We show how to use our profiling techniques to optimize the performance of the Bayes, Labyrinth and Intruder applications. We discuss the design and implementation of our techniques in the Bartok-STM system. We process data offline or during garbage collection, where possible, in order to minimize the probe effect introduced by profiling.