Allocation wall: a limiting factor of Java applications on emerging multi-core platforms

Authors:
Yi Zhao;Jin Shi;Kai Zheng;Haichuan Wang;Haibo Lin;Ling Shao
Affiliations:
IBM, Beijing, China;Tsinghua University, Beijing, China;IBM, Beijing, China;IBM, Beijing, China;IBM, Beijing, China;IBM, Beijing, China
Venue:
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Year:
2009

Citing 25
Cited 15

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A real-time garbage collector based on the lifetimes of objects

Communications of the ACM
A LISP garbage-collector for virtual-memory computer systems

Communications of the ACM
Recursive functions of symbolic expressions and their computation by machine, Part I

Communications of the ACM
Heap profiling for space-efficient Java

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
A Single-Chip Multiprocessor

Computer
Generation Scavenging: A non-disruptive high performance storage reclamation algorithm

SDE 1 Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments
Initial Observations of the Simultaneous Multithreading Pentium 4 Processor

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Myths and realities: the performance impact of garbage collection

Proceedings of the joint international conference on Measurement and modeling of computer systems
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
NUMA-Aware Java Heaps for Server Applications

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Simulating Commercial Java Throughput Workloads: A Case Study

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Uniqueness inference for compile-time object deallocation

Proceedings of the 6th international symposium on Memory management
Statistically rigorous java performance evaluation

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A principled approach to nondeferred reference-counting garbage collection

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
The CLOSER: automating resource management in java

Proceedings of the 7th international symposium on Memory management
Jolt: lightweight dynamic analysis and removal of object churn

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Performance Studies of Commercial Workloads on a Multi-core System

IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Addressing Cache/Memory Overheads in Enterprise Java CMP Servers

IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization

Scalability limitations when running a Java web server on a chip multiprocessor

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Performance analysis of idle programs

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Deferred gratification: engineering for high performance garbage collection from the get go

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Multicore garbage collection with local heaps

Proceedings of the international symposium on Memory management
Garbage collection auto-tuning for Java mapreduce on multi-cores

Proceedings of the international symposium on Memory management
Pervasive parallelism for managed runtimes

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
ORDER: object centric deterministic replay for Java

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Reuse, recycle to de-bloat software

Proceedings of the 25th European conference on Object-oriented programming
The interplay of software bloat, hardware energy proportionality and system bottlenecks

HotPower '11 Proceedings of the 4th Workshop on Power-Aware Computing and Systems
Why nothing matters: the impact of zeroing

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Improving shared cache behavior of multithreaded object-oriented applications in multicores

Proceedings of the International Conference on Computer-Aided Design
Continuous object access profiling and optimizations to overcome the memory wall and bloat

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Does lean imply green?: a study of the power performance implications of Java runtime bloat

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
A black-box approach to understanding concurrency in DaCapo

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Mio: a high-performance multicore io manager for GHC

Proceedings of the 2013 ACM SIGPLAN symposium on Haskell

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-core processors are widely used in computer systems. As the performance of microprocessors greatly exceeds that of memory, the memory wall becomes a limiting factor. It is important to understand how the large disparity of speed between processor and memory influences the performance and scalability of Java applications on emerging multi-core platforms. In this paper, we studied two popular Java benchmarks, SPECjbb2005 and SPECjvm2008, on multi-core platforms including Intel Clovertown and AMD Phenom. We focus on the "partially scalable" benchmark programs. With smaller number of CPU cores these programs scale perfectly, but when more cores and software threads are used, the slope of the scalability curve degrades dramatically. We identified a strong correlation between scalability, object allocation rate and memory bus write traffic in our experiments with our partially scalable programs. We find that these applications allocate large amounts of memory and consume almost all the memory write bandwidth in our hardware platforms. Because the write bandwidth is so limited, we propose the following hypothesis: the scalability and performance is limited by the object allocation on emerging multi-core platforms for those objects-allocation intensive Java applications, as if these applications are running into an "allocation wall". In order to verify this hypothesis, several experiments are performed, including measuring key architecture level metrics, composing a micro-benchmark program, and studying the effect of modifying some of the "partially scalable" programs. All the experiments strongly suggest the existence of the allocation wall.