Evaluating the impact of simultaneous multithreading on network servers using real hardware

Authors:
Yaoping Ruan;Vivek S. Pai;Erich Nahum;John M. Tracey
Affiliations:
Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;IBM T.J.Watson Research Center, Yorktown Heights, NY;IBM T.J.Watson Research Center, Yorktown Heights, NY
Venue:
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Year:
2005

Citing 27
Cited 6

An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads

Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Effects of Multithreading on Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
IO-Lite: a unified I/O buffering and caching system

ACM Transactions on Computer Systems (TOCS)
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Multi-processor performance on the Tera MTA

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Complete Computer System Simulation: The SimOS Approach

IEEE Parallel & Distributed Technology: Systems & Technology
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
Microarchitectural denial of service: insuring microarchitectural fairness

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Performance Characterization of the Pentium® Pro Processor

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
EVALUATING AND IMPROVING PERFORMANCE OF MULTIMEDIA APPLICATIONS ON SIMULTANEOUS MULTI-THREADING

ICPADS '02 Proceedings of the 9th International Conference on Parallel and Distributed Systems
Initial Observations of the Simultaneous Multithreading Pentium 4 Processor

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
The Impact of Resource Partitioning on SMT Processors

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Myths and realities: the performance impact of garbage collection

Proceedings of the joint international conference on Measurement and modeling of computer systems
Making the "box" transparent: system call performance as a first-class result

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Flash: an efficient and portable web server

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro
Experiences with the Denelcor HEP

Parallel Computing

Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Do commodity SMT processors need more OS research?

ACM SIGOPS Operating Systems Review
Shore-MT: a scalable storage manager for the multicore era

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
SIP server performance on multicore systems

IBM Journal of Research and Development
A Superscalar software architecture model for Multi-Core Processors (MCPs)

Journal of Systems and Software
Analyzing the effects of hyperthreading on the performance of data management systems

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the performance of simultaneous multithreading (SMT) for network servers using actual hardware, multiple network server applications, and several workloads. Using three versions of the Intel Xeon processor with Hyper-Threading, we perform macroscopic analysis as well as microarchitectural measurements to understand the origins of the performance bottlenecks for SMT processors in these environments. The results of our evaluation suggest that the current SMT support in the Xeon is application and workload sensitive, and may not yield significant benefits for network servers.In general, we find that enabling SMT on real hardware usually produces only slight performance gains, and can sometimes lead to performance loss. In the uniprocessor case, previous studies appear to have neglected the OS overhead in switching from a uniprocessor kernel to an SMT-enabled kernel. The performance loss associated with such support is comparable to the gains provided by SMT. In the 2-way multiprocessor case, the higher number of memory references from SMT often causes the memory system to become the bottleneck, offsetting any processor utilization gains. This effect is compounded by the growing gap between processor speeds and memory latency. In trying to understand the large gains shown by simulation studies, we find that while the general trends for microarchitectural behavior agree with real hardware, differences in sizing assumptions and performance models yield much more optimistic benefits for SMT than we observe.