A case for shared instruction cache on chip multiprocessors running OLTP

Authors:
Partha Kundu;Murali Annavaram;Trung Diep;John Shen
Affiliations:
Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation
Venue:
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Year:
2003

Citing 12
Cited 4

The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Designing high bandwidth on-chip caches

Proceedings of the 24th annual international symposium on Computer architecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads

Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Code layout optimizations for transaction processing workloads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
SimICS/sun4m: a virtual workstation

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference

Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamically configurable shared CMP helper engines for improved performance

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Large scale Itanium® 2 processor OLTP workload characterization and optimization

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Improving the performance and power efficiency of shared helpers in CMPs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to their large code footprint, OLTP workloads suffer from significant I-cache miss rates on contemporary microprocessors. This paper analyzes the I-stream behavior of an OLTP workload, called the Oracle Database Benchmark (ODB), on Chip-Multiprocessors (CMP). Our results show that, although, the overall code footprint of ODB is large, multiple ODB threads running concurrently on multiple processors tend to access common code segments frequently, thus exhibiting significant constructive sharing. In fact, in a CMP system, an I-cache shared between multiple processors incurs similar miss rate as a dedicated I-cache per processor where the per processor I-cache has the same capacity as the shared I-cache. Based on these observations, this paper makes the case for a shared I-cache organization in a CMP, instead of the traditional approach of using a dedicated I-cache per processor.Furthermore, this paper shows that OLTP code stream exhibits good spatial locality. Adding a simple dedicated Line Buffer per processor can exploit this spatial locality effectively, to reduce latency and bandwidth requirements on the shared cache. The proposed shared I-cache organization results in an improvement of at least 5X in miss rate over a dedicated cache organization, for the same total capacity.