Experiences in building and scaling an enterprise application on multicore systems

  • Authors:
  • Seetharami Seelam;Yanbin Liu;Parijat Dube;Megumi Ito;Deniz Binay;Michael Dawson;Pramod Nagaraja;Graeme Johnson;Liana Fong;Michel Hack;Xiaoqiao Meng;Yuqing Gao;Li Zhang

  • Affiliations:
  • IBM Thomas J. Watson Research Center, Yorktown Heights, NYUSA;IBM Thomas J. Watson Research Center, Yorktown Heights, NYUSA;IBM Thomas J. Watson Research Center, Yorktown Heights, NYUSA;IBM Tokyo Research Lab, Tokyo, Japan;IBM Thomas J. Watson Research Center, Yorktown Heights, NYUSA;IBM Software Group, Ottawa, Canada;IBM Software Group, Bangalore, India;IBM Software Group, Ottawa, Canada;IBM Thomas J. Watson Research Center, Yorktown Heights, NYUSA;IBM Thomas J. Watson Research Center, Yorktown Heights, NYUSA;IBM Thomas J. Watson Research Center, Yorktown Heights, NYUSA;IBM Thomas J. Watson Research Center, Yorktown Heights, NYUSA;IBM Thomas J. Watson Research Center, Yorktown Heights, NYUSA

  • Venue:
  • Concurrency and Computation: Practice & Experience
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Even though Java is the de facto programming language for enterprise applications, there exist only a limited number of Java-based benchmarks to understand the performance on emerging multicore systems. To bridge this gap, this paper presents a report generation benchmark that is developed on top of Open Source Apache Geronimo's DayTrader benchmark. Report generation and rendering is at the heart of many enterprise business analytics and business intelligence software products, and it is used by many enterprise applications. We evaluate the performance scalability of this benchmark on a state-of-the-art Power7 multicore system with 8 Power7 cores and 32 hardware threads. The benchmark throughput scales linearly up to eight hardware threads, but beyond that point, the throughput falls sharply. Significant locking in the Java class libraries for non-shared objects results in this performance drop. Splitting the locks on these shared classes results in near linear scaling from eight to 32 threads and improved the throughput by 80%. We also show that the Linux operating system load balancing could result in a degraded application performance in hardware multithreaded systems and simultaneous-multithreads-aware task scheduling results in uniform core-resource utilization as well as improved application performance. Copyright © 2011 John Wiley & Sons, Ltd.