Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Improving instruction cache performance in OLTP
ACM Transactions on Database Systems (TODS)
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Shore-MT: a scalable storage manager for the multicore era
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Temporal instruction fetch streaming
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
During an instruction miss a processor is unable to fetch instructions. The more frequent instruction misses are the less able a modern processor is to find useful work to do and thus performance suffers. Online transaction processing (OLTP) suffers from high instruction miss rates since the instruction footprint of OLTP transactions does not fit in today's L1-I caches. However, modern many-core chips have ample aggregate L1 cache capacity across multiple cores. Looking at the code paths concurrently executing transactions follow, we observe a high degree of repetition both within and across transactions. This work presents TMi a technique that uses thread migration to reduce instruction misses by spreading the footprint of a transaction over multiple L1 caches. TMi is a software-transparent, hardware technique; TMi requires no code instrumentation, and efficiently utilizes available cache capacity. This work evaluates TMi's potential and shows that it may reduce instruction misses by 51% on average. This work discusses the underlying tradeoffs and challenges, such as an increase in data misses, and points to potential solutions.