Addressing Cache/Memory Overheads in Enterprise Java CMP Servers

Authors:
Kumar Shiv;Ravi Iyer;Mahesh Bhat;Ramesh Illikkal;Michael Jones;Srihari Makineni;Jason Domer;Don Newell
Affiliations:
Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation
Venue:
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Year:
2007

Citing 0
Cited 3

Allocation wall: a limiting factor of Java applications on emerging multi-core platforms

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Tale in the multi-core era: is java still competitive to host SIP applications?

ICC'09 Proceedings of the 2009 IEEE international conference on Communications
THOR: a performance analysis tool for java applications running on multicore systems

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

As we enter the era of chip multiprocessor (CMP) architectures, it is important that we explore the scaling characteristics of mainstream server workloads on these platforms. In this paper, we analyze the performance of two significant Enterprise Java workloads (SPECjAppServer2004 and SPECjbb2005) on CMP platforms - present and future. We start by characterizing the core, cache and memory behavior of these workloads on the newly released Intel Core 2 Duo Xeon platform (dual-core, dual-socket). Our findings from these measurements indicate that these workloads have a significant performance dependence on cache and memory subsystems. In order to guide the evolution of future CMP platforms, we perform a detailed investigation of potential cache and memory architecture choices. This includes analyzing the effects of thread sharing and migration, object allocation and garbage collection. Based on the observed behavior, we propose architectural optimizations along three dimensions: (a) data-less cache line initialization (DCLI), (b) hardware-guided thread collocation (HGTC) and (c) on-socket DRAM caches (OSDC). In this paper, we will describe these optimizations in detail and validate their performance potential based on trace-driven simulations and execution-driven emulation. Overall, we expect that the findings in this paper will guide future CMP architectures for Enterprise Java servers.