Exploring multi-threaded Java application performance on multicore hardware

Authors:
Jennfer B. Sartor;Lieven Eeckhout
Affiliations:
Ghent University, Ghent, Belgium;Ghent University, Ghent, Belgium
Venue:
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Year:
2012

Citing 18
Cited 1

Cramming more components onto integrated circuits

Readings in computer architecture
Saving energy with architectural and frequency adaptations for multimedia applications

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Dynamic frequency and voltage control for a multiple clock domain microarchitecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Compile-time dynamic voltage scaling settings: opportunities and limits

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Impact of virtual execution environments on processor energy consumption and hardware adaptation

Proceedings of the 2nd international conference on Virtual execution environments
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Statistically rigorous java performance evaluation

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
The Case for Energy-Proportional Computing

Computer
Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Wake up and smell the coffee: evaluation methodology for the 21st century

Communications of the ACM - Designing games with a purpose
Looking back on the language and hardware revolutions: measured power, performance, and scaling

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Why nothing matters: the impact of zeroing

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
The yin and yang of power and performance for asymmetric hardware and managed software

Proceedings of the 39th Annual International Symposium on Computer Architecture

Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

While there have been many studies of how to schedule applications to take advantage of increasing numbers of cores in modern-day multicore processors, few have focused on multi-threaded managed language applications which are prevalent from the embedded to the server domain. Managed languages complicate performance studies because they have additional virtual machine threads that collect garbage and dynamically compile, closely interacting with application threads. Further complexity is introduced as modern multicore machines have multiple sockets and dynamic frequency scaling options, broadening opportunities to reduce both power and running time. In this paper, we explore the performance of Java applications, studying how best to map application and virtual machine (JVM) threads to a multicore, multi-socket environment. We explore both the cost of separating JVM threads from application threads, and the opportunity to speed up or slow down the clock frequency of isolated threads. We perform experiments with the multi-threaded DaCapo benchmarks and pseudojbb2005 running on the Jikes Research Virtual Machine, on a dual-socket, 8-core Intel Nehalem machine to reveal several novel, and sometimes counter-intuitive, findings. We believe these insights are a first but important step towards understanding and optimizing managed language performance on modern hardware.