Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A real-time garbage collector based on the lifetimes of objects
Communications of the ACM
A LISP garbage-collector for virtual-memory computer systems
Communications of the ACM
Recursive functions of symbolic expressions and their computation by machine, Part I
Communications of the ACM
Heap profiling for space-efficient Java
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Computer
Generation Scavenging: A non-disruptive high performance storage reclamation algorithm
SDE 1 Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments
Initial Observations of the Simultaneous Multithreading Pentium 4 Processor
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Myths and realities: the performance impact of garbage collection
Proceedings of the joint international conference on Measurement and modeling of computer systems
Chip Multithreading: Opportunities and Challenges
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
NUMA-Aware Java Heaps for Server Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Simulating Commercial Java Throughput Workloads: A Case Study
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
The DaCapo benchmarks: java benchmarking development and analysis
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Uniqueness inference for compile-time object deallocation
Proceedings of the 6th international symposium on Memory management
Statistically rigorous java performance evaluation
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A principled approach to nondeferred reference-counting garbage collection
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
The CLOSER: automating resource management in java
Proceedings of the 7th international symposium on Memory management
Jolt: lightweight dynamic analysis and removal of object churn
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Performance Studies of Commercial Workloads on a Multi-core System
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Addressing Cache/Memory Overheads in Enterprise Java CMP Servers
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Scalability limitations when running a Java web server on a chip multiprocessor
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Performance analysis of idle programs
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Deferred gratification: engineering for high performance garbage collection from the get go
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Multicore garbage collection with local heaps
Proceedings of the international symposium on Memory management
Garbage collection auto-tuning for Java mapreduce on multi-cores
Proceedings of the international symposium on Memory management
Pervasive parallelism for managed runtimes
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
ORDER: object centric deterministic replay for Java
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Reuse, recycle to de-bloat software
Proceedings of the 25th European conference on Object-oriented programming
The interplay of software bloat, hardware energy proportionality and system bottlenecks
HotPower '11 Proceedings of the 4th Workshop on Power-Aware Computing and Systems
Why nothing matters: the impact of zeroing
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Improving shared cache behavior of multithreaded object-oriented applications in multicores
Proceedings of the International Conference on Computer-Aided Design
Continuous object access profiling and optimizations to overcome the memory wall and bloat
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Does lean imply green?: a study of the power performance implications of Java runtime bloat
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
A black-box approach to understanding concurrency in DaCapo
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Mio: a high-performance multicore io manager for GHC
Proceedings of the 2013 ACM SIGPLAN symposium on Haskell
Hi-index | 0.00 |
Multi-core processors are widely used in computer systems. As the performance of microprocessors greatly exceeds that of memory, the memory wall becomes a limiting factor. It is important to understand how the large disparity of speed between processor and memory influences the performance and scalability of Java applications on emerging multi-core platforms. In this paper, we studied two popular Java benchmarks, SPECjbb2005 and SPECjvm2008, on multi-core platforms including Intel Clovertown and AMD Phenom. We focus on the "partially scalable" benchmark programs. With smaller number of CPU cores these programs scale perfectly, but when more cores and software threads are used, the slope of the scalability curve degrades dramatically. We identified a strong correlation between scalability, object allocation rate and memory bus write traffic in our experiments with our partially scalable programs. We find that these applications allocate large amounts of memory and consume almost all the memory write bandwidth in our hardware platforms. Because the write bandwidth is so limited, we propose the following hypothesis: the scalability and performance is limited by the object allocation on emerging multi-core platforms for those objects-allocation intensive Java applications, as if these applications are running into an "allocation wall". In order to verify this hypothesis, several experiments are performed, including measuring key architecture level metrics, composing a micro-benchmark program, and studying the effect of modifying some of the "partially scalable" programs. All the experiments strongly suggest the existence of the allocation wall.