SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads

Authors:
Islam Atta;Pinar Tozun;Anastasia Ailamaki;Andreas Moshovos
Affiliations:
-;-;-;-
Venue:
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2012

Citing 20
Cited 4

Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads

Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Bloom filtering cache misses for accurate data speculation and prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Intel Virtualization Technology

Computer
Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Improving instruction cache performance in OLTP

ACM Transactions on Database Systems (TODS)
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Shore-MT: a scalable storage manager for the multicore era

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Temporal instruction fetch streaming

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
Thread motion: fine-grained power management for multi-core systems

Proceedings of the 36th annual international symposium on Computer architecture
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
Data-oriented transaction execution

Proceedings of the VLDB Endowment
Clearing the clouds: a study of emerging scale-out workloads on modern hardware

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Proactive instruction fetch

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing OLTP instruction misses with thread migration

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware

From A to E: analyzing TPC's OLTP benchmarks: the obsolete, the ubiquitous, the unexplored

Proceedings of the 16th International Conference on Extending Database Technology
OLTP in wonderland: where do cache misses come from in major OLTP components?

Proceedings of the Ninth International Workshop on Data Management on New Hardware
STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution

Proceedings of the 40th Annual International Symposium on Computer Architecture
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online transaction processing (OLTP) is at the core of many data center applications. OLTP workloads are known to have large instruction footprints that foil existing L1 instruction caches resulting in poor overall performance. Prefetching can reduce the impact of such instruction cache miss stalls, however, state-of-the-art solutions require large dedicated hardware tables on the order of 40KB in size. SLICC is a programmer transparent, low cost technique to minimize instruction cache misses when executing OLTP workloads. SLICC migrates threads, spreading their instruction footprint over several L1 caches. It exploits repetition within and across transactions, where a transaction's first iteration prefetches the instructions for subsequent iterations or similar subsequent transactions. SLICC reduces instruction misses by 58% on average for TPC-C and TPCE, thereby improving performance by 68%. When compared to a state-of-the-art prefetcher, and notwithstanding the increased storage overheads (42x as compared to SLICC), performance using SLICC is 21% higher for TPC-E and within 2% for TPC-C.