Hiding memory latency using dynamic scheduling in shared-memory multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of multithreaded uniprocessors for commercial application environments
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Studies of Windows NT performance using dynamic execution traces
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Performance of an OLTP application on symmetry multiprocessor system
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Improving index performance through prefetching
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The effect of seance communication on multiprocessing systems
ACM Transactions on Computer Systems (TOCS)
A study of memory system performance of multimedia applications
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Code layout optimizations for transaction processing workloads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
High-Performance DRAMs in Workstation Environments
IEEE Transactions on Computers
Implementation aspects of a SPARC V9 complete machine simulator
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Scalability and resource usage of an OLAP benchmark on clusters of PCs
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
A comparative study of arbitration algorithms for the Alpha 21364 pipelined router
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Software Trace Cache for Commercial Applications
International Journal of Parallel Programming
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Inferential queueing and speculative push for reducing critical communication latencies
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Proceedings of the 30th annual international symposium on Computer architecture
Behavior and Performance of Interactive Multi-Player Game Servers
Cluster Computing
Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The Impact of Negative Acknowledgments in Shared Memory Scientific Applications
IEEE Transactions on Parallel and Distributed Systems
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
A case for shared instruction cache on chip multiprocessors running OLTP
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
The Fuzzy Correlation between Code and Performance Predictability
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Journal of Parallel and Distributed Computing
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Evaluating IA-32 web servers through simics: a practical experience
Journal of Systems Architecture: the EUROMICRO Journal
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Store-Ordered Streaming of Shared Memory
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Queue - Multiprocessors
Store Memory-Level Parallelism Optimizations for Commercial Applications
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
DBmbench: fast and accurate database workload representation on modern microarchitecture
CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Accelerating database operators using a network processor
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Inferential queueing and speculative push
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Proceedings of the 33rd annual international symposium on Computer Architecture
Improving instruction cache performance in OLTP
ACM Transactions on Database Systems (TODS)
A performance counter architecture for computing accurate CPI components
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
An analysis of the effects of miss clustering on the cost of a cache miss
Proceedings of the 4th international conference on Computing frontiers
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 2007 workshop on Experimental computer science
ecs'07 Experimental computer science on Experimental computer science
Steps towards cache-resident transaction processing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
Shore-MT: a scalable storage manager for the multicore era
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Experience with building a commodity intel-based ccNUMA system
IBM Journal of Research and Development
Optimizing the datacenter for data-centric workloads
Proceedings of the international conference on Supercomputing
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Reducing OLTP instruction misses with thread migration
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems (TOCS)
From A to E: analyzing TPC's OLTP benchmarks: the obsolete, the ubiquitous, the unexplored
Proceedings of the 16th International Conference on Extending Database Technology
SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Spatiotemporal Coherence Tracking
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
OLTP in wonderland: where do cache misses come from in major OLTP components?
Proceedings of the Ninth International Workshop on Data Management on New Hardware
STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution
Proceedings of the 40th Annual International Symposium on Computer Architecture
SHIFT: shared history instruction fetch for lean-core server processors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Eliminating unscalable communication in transaction processing
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized to perform well on scientific and engineering workloads. Given the radically different behavior of database workloads (especially OLTP), it is important to re-evaluate key system design decisions in the context of this important class of applications.This paper examines the behavior of database workloads on shared-memory multiprocessors with aggressive out-of-order processors, and considers simple optimizations that can provide further performance improvements. Our study is based on detailed simulations of the Oracle commercial database engine. The results show that the combination of out-of-order execution and multiple instruction issue is indeed effective in improving performance of database workloads, providing gains of 1.5 and 2.6 times over an in-order single-issue processor for OLTP and DSS, respectively. In addition, speculative techniques enable optimized implementations of memory consistency models that significantly improve the performance of stricter consistency models, bringing the performance to within 10--15% of the performance of more relaxed models.The second part of our study focuses on the more challenging OLTP workload. We show that an instruction stream buffer is effective in reducing the remaining instruction stalls in OLTP, providing a 17% reduction in execution time (approaching a perfect instruction cache to within 15%). Furthermore, our characterization shows that a large fraction of the data communication misses in OLTP exhibit migratory behavior; our preliminary results show that software prefetch and writeback/flush hints can be used for this data to further reduce execution time by 12%.