Automatic parallelization of fine-grained metafunctions on a chip multiprocessor

Authors:
Sanghoon Lee;James Tuck
Affiliations:
North Carolina State University Department of Electrical and Computer Engineering, Raleigh, NC;North Carolina State University Department of Electrical and Computer Engineering, Raleigh, NC
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 32
Cited 0

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient detection of all pointer and array access errors

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Low-cost, concurrent checking of pointer and array accesses in C programs

Software—Practice & Experience
Eraser: a dynamic data race detector for multithreaded programs

ACM Transactions on Computer Systems (TOCS)
Concurrent garbage collection using program slices on multithreaded processors

Proceedings of the 2nd international symposium on Memory management
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
CCured: type-safe retrofitting of legacy code

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A system and language for building system-specific, static analyses

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Efficient and precise datarace detection for multithreaded object-oriented programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Tracking down software bugs using automatic anomaly detection

Proceedings of the 24th International Conference on Software Engineering
Enhancing software reliability with speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Debugging via Run-Time Type Checking

FASE '01 Proceedings of the 4th International Conference on Fundamental Approaches to Software Engineering
Automatic verification of the SCI cache coherence protocol

CHARME '95 Proceedings of the IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
RacerX: effective, static detection of race conditions and deadlocks

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
iWatcher: Efficient Architectural Support for Software Debugging

Proceedings of the 31st annual international symposium on Computer architecture
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Helper Threads via Virtual Multithreading

IEEE Micro
SafeMem: Exploiting ECC-Memory for Detecting Memory Leaks and Memory Corruption During Production Runs

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Low-Overhead Interactive Debugging via Dynamic Instrumentation with DISE

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
CMC: a pragmatic approach to model checking real code

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
HeapMon: a helper-thread approach to programmable, automatic, and low-overhead memory bug detection

IBM Journal of Research and Development
Comprehensively and efficiently protecting the heap

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Using Valgrind to detect undefined value errors with bit-precision

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
StackGuard: automatic adaptive detection and prevention of buffer-overflow attacks

SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
Parallelizing security checks on commodity hardware

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Combining thread level speculation helper threads and runahead execution

Proceedings of the 23rd international conference on Supercomputing
SoftBound: highly compatible and complete spatial memory safety for c

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Fast Track: A Software System for Speculative Program Optimization

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
CETS: compiler enforced temporal safety for C

Proceedings of the 2010 international symposium on Memory management
HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Light-weight bounds checking

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the importance of reliability and security, prior studies have proposed inlining metafunctions into applications for detecting bugs and security vulnerabilities. However, because these software techniques add frequent, fine-grained instrumentation to programs, they often incur large runtime overheads. In this work, we consider an automatic thread extraction technique for removing these fine-grained checks from a main application and scheduling them on helper threads. In this way, we can leverage the resources available on a CMP to reduce the latency and overhead of fine-grained checking codes. Our parallelization strategy extracts metafunctions from a single threaded application and executes them in customized helper threads—threads constructed to mirror relevant fragments of the main program’s behavior in order to keep communication and overhead low. To get good performance, we consider optimizations that reduce communication and balance work among many threads. We evaluate our parallelization strategy on Mudflap, a pointer-use checking tool in GCC. To show the benefits of our technique, we compare it to a manually parallelized version of Mudflap. We run our experiments on an architectural simulator with support for fast queueing operations. On a subset of SPECint 2000, our automatically parallelized code using static load balance is only 19% slower, on average, than the manually parallelized version on a simulated eight-core system. In addition, our automatically parallelized code using dynamic load balance is competitive, on average, to the manually parallelized version on a simulated eight-core system. Furthermore, all the applications except parser achieve better speedups with our automatic algorithms than with the manual approach. Also, our approach introduces very little overhead in the main program—it is kept under 100%, which is more than a 5.3× reduction compared to serial Mudflap.