Performance Studies of Commercial Workloads on a Multi-core System

Authors:
Jessica H. Tseng;Hao Yu;Shailabh Nagar;Niteesh Dubey;Hubertus Franke;Pratap Pattnaik;Hiroshi Inoue;Toshio Nakatani
Affiliations:
IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598-0218, US. jhtseng@us.ibm.com;IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598-0218, US. yuh@us.ibm.com;IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598-0218, US. shailabh@us.ibm.com;IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598-0218, US. niteesh@us.ibm.com;IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598-0218, US. frankeh@us.ibm.com;IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598-0218, US. pratap@us.ibm.com;IBM Toyko Research Lab., 1623-14 Shimo-tsuruma, Yamato-shi, Kanagawa 242-8502, Japan. inouehrs@jp.ibm.com;IBM Toyko Research Lab., 1623-14 Shimo-tsuruma, Yamato-shi, Kanagawa 242-8502, Japan. nakatani@jp.ibm.com
Venue:
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Year:
2007

Citing 0
Cited 7

Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Tale in the multi-core era: is java still competitive to host SIP applications?

ICC'09 Proceedings of the 2009 IEEE international conference on Communications
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Experiences in building and scaling an enterprise application on multicore systems

Concurrency and Computation: Practice & Experience
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

The multi-threaded nature of many commercial applications makes them seemingly a good fit with the increasing number of available multi-core architectures. This paper presents our performance studies of a collection of commercial workloads on a multi-core system that is designed for total throughput. The selected workloads include full operational applications such as SAP-SD and IBM Trade, and popular synthetic benchmarks such as SPECjbb2005, SPEC SDET, Dbench, and Tbench. To evaluate the performance scalability and the thread-placement sensitivity, we monitor the application throughput, processor performance, and the memory subsystem of 8, 16, 24, and 32 hardware threads with (a) increasing number of cores and (b) increasing number of threads per core. We observe that these workloads scale close to linearly (with efficiencies ranging from 86% to 99%) with increasing number of cores. For scaling with hardware-threads per core, the efficiencies are between 50% and 70%. Furthermore, among other observations, our data show that the ability of hiding long latency memory operations (i.e. L2 misses) in a multi-core system enables the performance scaling.