TAPE: a transactional application profiling environment

Authors:
Hassan Chafi;Chi Cao Minh;Austen McDonald;Brian D. Carlstrom;JaeWoong Chung;Lance Hammond;Christos Kozyrakis;Kunle Olukotun
Affiliations:
Stanford University;Stanford University;Stanford University;Stanford University;Stanford University;Stanford University;Stanford University;Stanford University
Venue:
Proceedings of the 19th annual international conference on Supercomputing
Year:
2005

Citing 17
Cited 9

Performance debugging shared memory multiprocessor programs with MTOOL

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
MemSpy: analyzing memory system bottlenecks in programs

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Integrating performance monitoring and communication in parallel computers

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Shared-memory performance profiling

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Hardware Support for Flexible Distributed Shared Memory

IEEE Transactions on Computers
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Programmable Co-processor for Profiling

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Programming with transactional coherence and consistency (TCC)

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Unbounded Transactional Memory

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Characterization of TCC on Chip-Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques

Characterization of TCC on Chip-Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A practical FPGA-based framework for novel CMP research

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Transactional collection classes

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
ATLAS: a chip-multiprocessor with transactional memory support

Proceedings of the conference on Design, automation and test in Europe
Dependence-aware transactional memory for increased concurrency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Debugging programs that use atomic blocks and transactional memory

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Discovering and understanding performance bottlenecks in transactional applications

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transactional event profiling in a best-effort hardware transactional memory system

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Evaluation of two formulations of the conjugate gradients method with transactional memory

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transactional Coherence and Consistency (TCC) provides a new parallel programming model that uses transactions as the basic unit of parallel work and communication. TCC simplifies the development of correct parallel code because hardware provides transaction atomicity and ordering. Nevertheless, the programmer or a dynamic compiler must still optimize the parallel code for performance.This paper presents TAPE, a hardware and software infrastructure for profiling in TCC systems. TAPE extends the hardware for transactional execution to identify performance impediments such as dependence violations, buffer overflows, and work imbalance. It filters infrequent events to reduce resource requirements and allows the programmer to focus on the most important bottlenecks. We demonstrate that TAPE introduces minimal die area and performance overhead and can be used continuously, even for production runs. Moreover, we demonstrate how to leverage the profiling information to guide optimization for a set of parallel applications. TAPE accurately identifies the source code location and type of the most important bottlenecks, allowing a programmer to achieve maximum parallel speedup with a few profiling steps.