An analysis of database workload performance on simultaneous multithreaded processors

Authors:
Jack L. Lo;Luiz André Barroso;Susan J. Eggers;Kourosh Gharachorloo;Henry M. Levy;Sujay S. Parekh
Affiliations:
Dept. of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA;Digital Equipment Corporation, Western Research Laboratory, 250 University Ave., Palo Alto, CA;Dept. of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA;Digital Equipment Corporation, Western Research Laboratory, 250 University Ave., Palo Alto, CA;Dept. of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA;Dept. of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA
Venue:
Proceedings of the 25th annual international symposium on Computer architecture
Year:
1998

Citing 20
Cited 74

Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Characterizing the caching and synchronization performance of a multiprocessor operating system

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Characterization of alpha AXP performance using TP and SPEC workloads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Commercial workload performance in the IBM POWER2 RISC System/6000 processor

IBM Journal of Research and Development
Contrasting characteristics and cache performance of technical and multi-user commercial workloads

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The impact of architectural trends on operating system performance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of multithreaded uniprocessors for commercial application environments

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Studies of Windows NT performance using dynamic execution traces

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Performance measurement of TruCluster systems under the TPC-C benchmark

Digital Technical Journal
Performance analysis using very large memory on the 64-bit AlphaServer system

Digital Technical Journal
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Performance of an OLTP application on symmetry multiprocessor system

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture

Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An analytical model of the working-set sizes in decision-support systems

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Thread-level parallelism and interactive performance of desktop applications

ACM SIGPLAN Notices
An analysis of operating system behavior on a simultaneous multithreaded architecture

ACM SIGPLAN Notices
A study of memory system performance of multimedia applications

Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Thread-level parallelism and interactive performance of desktop applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Code layout optimizations for transaction processing workloads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Software Trace Cache for Commercial Applications

International Journal of Parallel Programming
Data page layouts for relational databases on deep memory hierarchies

The VLDB Journal — The International Journal on Very Large Data Bases
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
In-memory Parallelism for Database Workloads

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Effect of node size on the performance of cache-conscious B+-trees

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving server software support for simultaneous multithreaded processors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Mini-Threads: Increasing TLP on Small-Scale SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Call graph prefetching for database applications

ACM Transactions on Computer Systems (TOCS)
Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A case for shared instruction cache on chip multiprocessors running OLTP

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Architectural Support for Enhanced SMT Job Scheduling

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Area and System Clock Effects on SMT/CMP Throughput

IEEE Transactions on Computers
Skewed caches from a low-power perspective

Proceedings of the 2nd conference on Computing frontiers
Reducing coherence overhead and boosting performance of high-end SMP multiprocessors running a DSS workload

Journal of Parallel and Distributed Computing
Evaluating the impact of simultaneous multithreading on network servers using real hardware

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving database performance on simultaneous multithreading processors

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The implications of working set analysis on supercomputing memory hierarchy design

Proceedings of the 19th annual international conference on Supercomputing
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
DBmbench: fast and accurate database workload representation on modern microarchitecture

CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Performance/Watt: the new server focus

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A characterization of data mining algorithms on a modern processor

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Accelerating database operators using a network processor

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Chip multithreading systems need a new operating system scheduler

Proceedings of the 11th workshop on ACM SIGOPS European workshop
Spin Detection Hardware for Improved Management of Multithreaded Systems

IEEE Transactions on Parallel and Distributed Systems
Large scale Itanium® 2 processor OLTP workload characterization and optimization

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Realizing parallelism in database operations: insights from a massively multithreaded architecture

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Improving instruction cache performance in OLTP

ACM Transactions on Database Systems (TODS)
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Cache-conscious frequent pattern mining on modern and emerging processors

The VLDB Journal — The International Journal on Very Large Data Bases
Performance of multithreaded chip multiprocessors and implications for operating system design

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications

IEEE Transactions on Computers
Efficient execution of multiple queries on deep memory hierarchy

Journal of Computer Science and Technology
Steps towards cache-resident transaction processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
P*TIME: highly scalable OLTP DBMS for managing update-intensive stream workload

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Speeding-up multiprocessors running DBMS workloads through coherence protocols

International Journal of High Performance Computing and Networking
Architectural characterization of XQuery workloads on modern processors

DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
DLL-conscious instruction fetch optimization for SMT processors

Journal of Systems Architecture: the EUROMICRO Journal
Managing operational business intelligence workloads

ACM SIGOPS Operating Systems Review
Exploiting multithreaded architectures to improve the hash join operation

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Temporal instruction fetch streaming

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Spatio-temporal memory streaming

Proceedings of the 36th annual international symposium on Computer architecture
An expressive language and efficient execution system for software agents

Journal of Artificial Intelligence Research
A multithreaded PowerPC processor for commercial servers

IBM Journal of Research and Development
MCC-DB: minimizing cache conflicts in multi-core processors for databases

Proceedings of the VLDB Endowment
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Two-phase trace-driven simulation (TPTS): a fast multicore processor architecture simulation approach

Software—Practice & Experience
Improving SMT performance scheduling processes

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Performance prediction for concurrent database workloads

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Analyzing the effects of hyperthreading on the performance of data management systems

International Journal of Parallel Programming
Do trace cache, value prediction and prefetching improve SMT throughput?

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
MiniTasking: improving cache performance for multiple query workloads

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Clearing the clouds: a study of emerging scale-out workloads on modern hardware

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Proactive instruction fetch

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Managing dynamic mixed workloads for operational business intelligence

DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

ACM Transactions on Computer Systems (TOCS)
Surveying the landscape: an in-depth analysis of spatial database workloads

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Simultaneous multithreading (SMT) is an architectural technique in which the processor issues multiple instructions from multiple threads each cycle. While SMT has been shown to be effective on scientific workloads, its performance on database systems is still an open question. In particular, database systems have poor cache performance, and the addition of multithreading has the potential to exacerbate cache conflicts.This paper examines database performance on SMT processors using traces of the Oracle database management system. Our research makes three contributions. First, it characterizes the memory-system behavior of database systems running on-line transaction processing and decision support system workloads. Our data show that while DBMS workloads have large memory footprints, there is substantial data reuse in a small, cacheable "critical" working set. Second, we show that the additional data cache conflicts caused by simultaneous multithreaded instruction scheduling can be nearly eliminated by the proper choice of software-directed policies for virtual-to-physical page mapping and per-process address offsetting. Our results demonstrate that with the best policy choices, D-cache miss rates on an 8-context SMT are roughly equivalent to those on a single-threaded superscalar. Multithreading also leads to better interthread instruction cache sharing, reducing I-cache miss rates by up to 35%. Third, we show that SMT's latency tolerance is highly effective for database applications. For example, using a memory-intensive OLTP workload, an 8-context SMT processor achieves a 3-fold increase in instruction throughput over a single-threaded superscalar with similar resources.