Performance debugging of parallel compression on multicore machines

  • Authors:
  • Janusz Borkowski

  • Affiliations:
  • Polish-Japanese Institute of Information Technology, Warsaw, Poland

  • Venue:
  • PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The power of contemporary processors is based more and more on multicore architectures. This kind of power is accessible only to parallel applications, which are able to provide work for each core. Creating a scalable parallel/multithreaded application efficiently using available cores is a difficult task, especially if I/O performance must be considered as well. We consider a multithreaded database loader with a compressing function. The performance of the loader is examined from a number of perspectives. Because compression is a computationally intensive task, parallel execution can potentially provide a big advantage in this case. A list of performance related areas we encountered is presented and discussed. We identify and verify tools allowing us to deal with specific performance areas. We find out, that only an orchestrated employment of several tools can bring the desired effect. The discussion provides a general procedure one can follow when improving the performance of multithreaded programs. Key performance areas specific to the database loader are pointed out. A special interest is directed towards performance variations observed when many parallel threads are active on a multicore CPU. A significant slowdown of computations is observed if many threads are computing simultaneously. The slowdown is related mainly to memory access and cache behavior and it is much larger for Core2 Quad system than a dual Xeon machine.