Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
New Tracing and Performance Analysis Techniques for Embedded Applications
RTCSA '08 Proceedings of the 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
Trace-based Performance Analysis on Cell BE
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Traces generation to simulate large-scale distributed applications
Proceedings of the Winter Simulation Conference
Hi-index | 0.00 |
Performance evaluation is key to the optimization of computer applications on multicore systems. While many techniques and profiling tools are available for measuring performance on homogeneous multicore platforms, most of them depend on the hardware support from the vendors. For developing applications on heterogeneous multicore systems, very few analysis tools exist to help the developers. This paper describes a software-based trace collection and performance analysis framework that can be ported to a variety of platforms via code instrumentation at the source level. A pure software profiling toolkit, called ParallelTracer, were implemented based on ANTLR, an open source parser generator, to support this framework. In this paper, we present our framework and toolkit. We use the IBM Cell processor as a case study to demonstrate the capability of ParallelTrace. Our results show that ParallelTracer provided useful information for programmers to understand program behaviors and identify potential performance bottlenecks via graphical visualization. We also discuss the runtime overhead of ParallelTracer. With proper usage, the performance and code size overhead introduced by our toolkit are limited around 19% to 5% and 9%, respectively, for the benchmark program in the case study.