DLL-conscious instruction fetch optimization for SMT processors

Authors:
Fayez Mohamood;Mrinmoy Ghosh;Hsien-Hsin S. Lee
Affiliations:
Georgia Tech Electrical and Computer Engineering, 266 Ferst Drive, School of ECE Georgia Tech, Atlanta, GA 30332, United States;Georgia Tech Electrical and Computer Engineering, 266 Ferst Drive, School of ECE Georgia Tech, Atlanta, GA 30332, United States;Georgia Tech Electrical and Computer Engineering, 266 Ferst Drive, School of ECE Georgia Tech, Atlanta, GA 30332, United States
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2008

Citing 20
Cited 1

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The measured performance of personal computer operating systems

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Using latency to evaluate interactive system performance

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT

Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving BTB performance in the presence of DLLs

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Thread-level parallelism and interactive performance of desktop applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Virtual Memory in Contemporary Microprocessors

IEEE Micro
Compiling for instruction cache performance on a multithreaded architecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Software-Managed Address Translation

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
TAXI: Trace Analysis for X86 Interpretation

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Initial Observations of the Simultaneous Multithreading Pentium 4 Processor

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
The Impact of Resource Partitioning on SMT Processors

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques

Extrinsic and intrinsic text cloning

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Simultaneous multithreading (SMT) processors can issue multiple instructions from distinct processes or threads in the same cycle. This technique effectively increases the overall throughput by keeping the pipeline resources more occupied at the potential expense of reducing single thread performance due to resource sharing. In the software domain, an increasing number of dynamically linked libraries (DLL) are used by applications and operating systems, providing better flexibility and modularity, and enabling code sharing. It is observed that a significant amount of execution time in software today is spent in executing standard DLL instructions, that are shared among multiple threads or processes. However, for an SMT processor with a virtually-indexed cache implementation, existing instruction fetching mechanisms can induce unnecessary false I-TLB and I-Cache misses caused by the DLL-based instructions that are intended to be shared. This problem is more prominent when multiple independent threads are executing concurrently on an SMT processor. In this work, we investigate a neglected form of contention between running threads in the I-TLB and I-Cache (including both VIVT and VIPT) due to DLLs. To address these shortcomings, we propose a system level technique involving a light-weight modification in the microarchitecture and the OS. By exploiting the nature of the DLLs in our optimized system, we can reinstate the intended sharing of the DLLs in an SMT machine. Using Microsoft Windows based applications, our simulation results show that the optimized instruction fetching mechanism can reduce the number of DLL misses up to 5.5 times and improve the instruction cache hit rates by up to 62%, resulting in up to 30% DLL IPC improvements and up to 15% overall IPC improvements.