Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Bloom filtering cache misses for accurate data speculation and prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Intel Virtualization Technology
Computer
Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Improving instruction cache performance in OLTP
ACM Transactions on Database Systems (TODS)
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Shore-MT: a scalable storage manager for the multicore era
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Temporal instruction fetch streaming
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Thread motion: fine-grained power management for multi-core systems
Proceedings of the 36th annual international symposium on Computer architecture
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
Data-oriented transaction execution
Proceedings of the VLDB Endowment
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing OLTP instruction misses with thread migration
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
From A to E: analyzing TPC's OLTP benchmarks: the obsolete, the ubiquitous, the unexplored
Proceedings of the 16th International Conference on Extending Database Technology
OLTP in wonderland: where do cache misses come from in major OLTP components?
Proceedings of the Ninth International Workshop on Data Management on New Hardware
STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution
Proceedings of the 40th Annual International Symposium on Computer Architecture
SHIFT: shared history instruction fetch for lean-core server processors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Online transaction processing (OLTP) is at the core of many data center applications. OLTP workloads are known to have large instruction footprints that foil existing L1 instruction caches resulting in poor overall performance. Prefetching can reduce the impact of such instruction cache miss stalls, however, state-of-the-art solutions require large dedicated hardware tables on the order of 40KB in size. SLICC is a programmer transparent, low cost technique to minimize instruction cache misses when executing OLTP workloads. SLICC migrates threads, spreading their instruction footprint over several L1 caches. It exploits repetition within and across transactions, where a transaction's first iteration prefetches the instructions for subsequent iterations or similar subsequent transactions. SLICC reduces instruction misses by 58% on average for TPC-C and TPCE, thereby improving performance by 68%. When compared to a state-of-the-art prefetcher, and notwithstanding the increased storage overheads (42x as compared to SLICC), performance using SLICC is 21% higher for TPC-E and within 2% for TPC-C.