ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Shoring up persistent applications
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving index performance through prefetching
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Code layout optimizations for transaction processing workloads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Benchmark Handbook: For Database and Transaction Processing Systems
Benchmark Handbook: For Database and Transaction Processing Systems
Proceedings of the 17th International Conference on Data Engineering
Block Oriented Processing of Relational Database Operations in Modern Computer Architectures
Proceedings of the 17th International Conference on Data Engineering
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Weaving Relations for Cache Performance
Proceedings of the 27th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Instruction prefetching using branch prediction information
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Call graph prefetching for database applications
ACM Transactions on Computer Systems (TOCS)
Buffering databse operations for enhanced instruction cache performance
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Instrumentation and optimization of Win32/intel executables using Etch
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Empirical evaluation of multi-level buffer cache collaboration for storage systems
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving database performance on simultaneous multithreading processors
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Improving instruction cache performance in OLTP
ACM Transactions on Database Systems (TODS)
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Vectorized data processing on the cell broadband engine
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Breaking the memory wall in MonetDB
Communications of the ACM - Surviving the data deluge
Temporal instruction fetch streaming
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Dynamic heterogeneity and the need for multicore virtualization
ACM SIGOPS Operating Systems Review
Database architecture evolution: mammals flourished long before dinosaurs became extinct
Proceedings of the VLDB Endowment
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Micro-specialization: dynamic code specialization of database management systems
Proceedings of the Tenth International Symposium on Code Generation and Optimization
RDIP: return-address-stack directed instruction prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SHIFT: shared history instruction fetch for lean-core server processors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Online transaction processing (OLTP) is a multibillion dollar industry with high-end database servers employing state-of-the-art processors to maximize performance. Unfortunately, recent studies show that CPUs are far from realizing their maximum intended throughput because of delays in the processor caches. When running OLTP, instruction-related delays in the memory subsystem account for 25 to 40% of the total execution time. In contrast to data, instruction misses cannot be overlapped with out-of-order execution, and instruction caches cannot grow as the slower access time directly affects the processor speed. The challenge is to alleviate the instruction-related delays without increasing the cache size. We propose Steps, a technique that minimizes instruction cache misses in OLTP workloads by multiplexing concurrent transactions and exploiting common code paths. One transaction paves the cache with instructions, while close followers enjoy a nearly miss-free execution. Steps yields up to 96.7% reduction in instruction cache misses for each additional concurrent transaction, and at the same time eliminates up to 64% of mispredicted branches by loading a repeating execution pattern into the CPU. This paper (a) describes the design and implementation of Steps, (b) analyzes Steps using microbenchmarks, and (c) shows Steps performance when running TPC-C on top of the Shore storage manager.