Automatic parallelization of fine-grained meta-functions on a chip multiprocessor

  • Authors:
  • Sanghoon Lee;James Tuck

  • Affiliations:
  • Department of Electrical & Computer Engineering, North Carolina State University;Department of Electrical & Computer Engineering, North Carolina State University

  • Venue:
  • CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the importance of reliability and security, prior studies have proposed inlining meta-functions into applications for detecting bugs and security vulnerabilities. However, because these software techniques add frequent, finegrained instrumentation to programs, they often incur large runtime overheads. In this work, we consider an automatic thread extraction technique for removing these fine-grained checks from a main application and scheduling them on helper threads. In this way, we can leverage the resources available on a CMP to reduce the latency and overhead of fine-grained checking codes. Our parallelization strategy automatically extracts meta-functions from the main application and executes them in customized helper threads--threads constructed to mirror relevant fragments of the main program's behavior in order to keep communication and overhead low. To get good performance, we consider optimizations that reduce communication and balance work among many threads. We evaluate our parallelization strategy on Mudflap, a pointer-use checking tool in GCC. To show the benefits of our technique, we compare it to a manually parallelized version of Mudflap. We run our experiments on an architectural simulator with support for fast queueing operations. On a subset of SPECint 2000, our automatically parallelized code is only 29% slower, on average, than the manually parallelized version on a simulated 8-core system. Furthermore, two applications achieve better speedups using our algorithms than with the manual approach. Also, our approach introduces very little overhead in the main program--it is kept under 100%, which is more than a 5.3脳 reduction compared to serial Mudflap.