On efficiently implementing global time for performance evaluation on multiprocessor systems
Journal of Parallel and Distributed Computing
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Introduction to Algorithms
IEEE Concurrency
Trace-based Performance Analysis on Cell BE
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Extending the scope of the controlled logical clock
Cluster Computing
Hi-index | 0.00 |
Cell BE is a heterogeneous multicore processor that has been developed as a means for efficient execution of parallel and vectorizable applications with high computation and memory requirements. The transition to multicores introduces the challenge of providing tools that help programmers tune their code running on these architectures. Tracing tools, in particular, often help locate performance problems related to thread and process communication.A major impediment to implementing tracing on Cell is the absence of a common clock that can be accessed at low cost from all cores. The OS clock is costly to access from the auxiliary cores and the hardware timers cannot be simultaneously set on all the cores. In this paper, we describe an offline trace analysis that assigns wall-clock time to trace records based on their thread-local time stamps and event order. Our experiments on several Cell SDK workloads show that the indeterminism in assigning the wall-clock time is low, on average 20---40 clock ticks (1.4---2.8 μs for 14.8 MHz clock). We also show how various practical problems, such as the imprecision of time measurement, can be overcome.