Reducing OLTP instruction misses with thread migration

Authors:
Islam Atta;Pinar Tözün;Anastasia Ailamaki;Andreas Moshovos
Affiliations:
University of Toronto;Eacole Polytechnique Fédérale de Lausanne;Eacole Polytechnique Fédérale de Lausanne;University of Toronto
Venue:
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Year:
2012

Citing 14
Cited 1

Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads

Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Improving instruction cache performance in OLTP

ACM Transactions on Database Systems (TODS)
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Predictor virtualization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Shore-MT: a scalable storage manager for the multicore era

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Temporal instruction fetch streaming

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Clearing the clouds: a study of emerging scale-out workloads on modern hardware

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Proactive instruction fetch

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

During an instruction miss a processor is unable to fetch instructions. The more frequent instruction misses are the less able a modern processor is to find useful work to do and thus performance suffers. Online transaction processing (OLTP) suffers from high instruction miss rates since the instruction footprint of OLTP transactions does not fit in today's L1-I caches. However, modern many-core chips have ample aggregate L1 cache capacity across multiple cores. Looking at the code paths concurrently executing transactions follow, we observe a high degree of repetition both within and across transactions. This work presents TMi a technique that uses thread migration to reduce instruction misses by spreading the footprint of a transaction over multiple L1 caches. TMi is a software-transparent, hardware technique; TMi requires no code instrumentation, and efficiently utilizes available cache capacity. This work evaluates TMi's potential and shows that it may reduce instruction misses by 51% on average. This work discusses the underlying tradeoffs and challenges, such as an increase in data misses, and points to potential solutions.