Simulating Commercial Java Throughput Workloads: A Case Study
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Spin Detection Hardware for Improved Management of Multithreaded Systems
IEEE Transactions on Parallel and Distributed Systems
PAM: a novel performance/power aware meta-scheduler for multi-core systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Achieving predictable performance through better memory controller placement in many-core CMPs
Proceedings of the 36th annual international symposium on Computer architecture
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Recent work has shown that multithreaded workloads running in execution-driven, full-system simulation environments cannot use instructions per cycle (IPC) as a valid performance metric due to non-deterministic program behavior. Unfortunately, invalidating IPC as a performance metric introduces its own host of difficulties: special workload setup, consideration of cold-startand end-effects, statistical methodologies leading to increased simulation bandwidth, and workload-specific, higher-level metrics to measure performance. This paper explores the non-determinism problem in multithreaded programs, describes a method to eliminate non-determinism across simulations of different experimental machine models, and demonstrates the suitability of this methodology for performing architectural performance analysis, thus redeeming IPC as a performance metric for multithreaded programs.