Characterization of L3 cache behavior of SPECjAppServer2002 and TPC-C

Authors:
Eriko Nurvitadhi;Nirut Chalainanont;Shih-Lien Lu
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Oregon State University, Corvallis, OR;MRL, Intel Labs, Hillsboro, OR
Venue:
Proceedings of the 19th annual international conference on Supercomputing
Year:
2005

Citing 8
Cited 2

Evaluating the performance of four snooping cache coherency protocols

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
MemorIES3: a programmable, real-time hardware emulation tool for multiprocessor server design

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
A low-overhead coherence solution for multiprocessors with private cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Memory System Behavior of Java-Based Middleware

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Reconfigurable Address Collector and Flying Cache Simulator

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Simulating L3 Caches in Real Time Using Hardware Accelerated Cache Simulation (HACS): A Case Study with SPECint 2000

SBAC-PAD '02 Proceedings of the 14th Symposium on Computer Architecture and High Performance Computing

Large scale Itanium® 2 processor OLTP workload characterization and optimization

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Using GPU to accelerate a pin-based multi-level cache simulator

SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the proliferation of e-businesses, Java™ Middleware and OLTP applications are gaining importance. As the gap between CPU and memory latencies continues to increase, the performance of these applications running on multiprocessor systems will become further limited by the memory system. This study characterizes the memory behavior of such applications using the SPECjAppServer2002 and TPC-C benchmarks running on a real multiprocessor system. More specifically, the shared and private L3 caches with invalidation- and update-based coherence protocols are evaluated using the Programmable Hardware-Assisted Cache Emulator (PHA$E). We found that coherency misses increase with larger private L3 caches, constituting up to more than 15% of all misses for both benchmarks. Additionally, a saturation point was observed at which employing larger private cache yields no further improvement in miss ratio. Conversely, the shared L3 cache design was observed to be more scalable since it does not suffer from coherence misses. Our limit study shows that the existing Write-Broadcast policy, which updates line copies in other caches during a write on a shared line, has the potential to simultaneously reduce private cache miss ratio and bus traffic. For example, at 64MB, it reduces the miss ratio by 53% and 44% respectively for SPECjAppServer2002 and TPC-C, while lowering the bus traffic by 18% and 11%. In overall, the policy can eliminate the aforementioned saturation point and allows private cache miss ratio that is comparable with the miss ratio of a shared cache.