Automatic inline allocation of objects
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
An evaluation of automatic object inline allocation techniques
Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Segregating heap objects by reference behavior and lifetime
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An automatic object inlining optimization and its evaluation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Creating and preserving locality of java applications at allocation and garbage collection times
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
Myths and realities: the performance impact of garbage collection
Proceedings of the joint international conference on Measurement and modeling of computer systems
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
Microphase: an approach to proactively invoking garbage collection for improved performance
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A Framework for Providing Quality of Service in Chip Multi-Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors
Proceedings of the 6th ACM conference on Computing frontiers
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cache topology aware computation mapping for multicores
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Hi-index | 0.00 |
Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much attention. In this paper, we first quantify the intra-thread and inter-thread cache line (block) reuse characteristics of a set of multithreaded C++ programs when executed in shared cache based multicores. Our results show that, as far as shared on-chip caches are concerned, inter-thread cache line (block) reuse distances are much higher than intra-thread cache line reuse distances. We study the impact of these characteristics on the hit/miss behavior of the shared last-level cache on a commercial multicore machine. We then show that, by rearranging accesses to the objects shared across different threads and to the objects stored in nearby memory locations, inter-thread (temporal and spatial) object reuse distances can be reduced, which in turn helps to reduce inter-thread cache line reuse distances. The results we collected using eight multithreaded applications show that our proposed shared cache-aware code restructuring strategy can reduce misses in the last-level on-chip cache of a commercial multicore machine by 25.4%, on average. These savings in cache misses translate in turn to average execution time improvement of 11.9%.