Effects of Multithreading on Cache Performance

Authors:
Hantak Kwak;Ben Lee;Ali R. Hurson;Suk-Han Yoon;Woo-Jong Hahn
Affiliations:
Oregon State Univ., Corvallis;Oregon State Univ., Corvallis;Pennsylvania State Univ., University Park;Electronics and Telecommunications Research Institute, Taejon, Korea;Electronics and Telecommunications Research Institute, Taejon, Korea
Venue:
IEEE Transactions on Computers - Special issue on cache memory and related problems
Year:
1999

Citing 22
Cited 12

The architecture of HEP

on Parallel MIMD computation: HEP supercomputer and its applications
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Multiprocessor system architectures

Multiprocessor system architectures
Evaluating stream buffers as a secondary cache replacement

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The effectiveness of multiple hardware contexts

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Thread scheduling for cache locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Multithreading with Distributed Functional Units

IEEE Transactions on Computers
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Programming with POSIX threads

Programming with POSIX threads
Run-time spatial locality detection and optimization

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
Lazy Prefetching

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
Performance Study of a Multithreaded Superscalar Microprocessor

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture

Analysis of optimal thread pool size

ACM SIGOPS Operating Systems Review
Analytical cache models with applications to cache partitioning

ICS '01 Proceedings of the 15th international conference on Supercomputing
Lookahead Scheduling Requests for Multisize Page Caching

IEEE Transactions on Computers
Improving server software support for simultaneous multithreaded processors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluating the impact of simultaneous multithreading on network servers using real hardware

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multithreaded architectures and the sort benchmark

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Evaluate the performance changes of processor simulator benchmarks When context switches are incorporated

Proceedings of the 2006 annual ACM SIGAda international conference on Ada
Cache-Friendly implementations of transitive closure

Journal of Experimental Algorithmics (JEA)
Characterizing and modeling the behavior of context switch misses

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
On concurrency improvements in enterprise SOA middleware

Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications

Journal of Signal Processing Systems
Understanding the behavior and implications of context switch misses

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question, however, remains as to how effective multithreading is on tolerating memory latency. The performance of multithreading is not only affected by the overlapping of memory latency with useful computation, but also strongly depends on the cache behavior and the overhead of multithreading (e.g., thread management and context-switch costs). In particular, multithreading affects the behavior of caches, and, thus, the overall performance in a nontrivial fashion. To study these issues, this paper presents the Multithreaded Virtual Processor (MVP) model. MVP integrates the multithreaded programming paradigm and a modern superscalar processor with support for fast context switching and thread scheduling. Our studies with MVP show that, in general, the performance improvements are obtained not only by tolerating memory latency but also lower cache miss rates due to exploitation of data locality.However, multithreading creates an additional stress on the memory hierarchy caused by the interference among threads. Also, the dynamic behavior of multithreaded execution hinders the instruction locality that results in a high number of misses in the L1 instruction cache.