Tolerating Late Memory Traps in Dynamically Scheduled Processors

Authors:
Xiaogang Qiu;Michel Dubois
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
2004

Citing 34
Cited 3

Software-controlled caches in the VMP multiprocessor

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Memory access buffering in multiprocessors

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Virtual memory primitives for user programs

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tradeoffs in supporting two page sizes

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cooperative shared memory: software and hardware for scalable multiprocessors

ACM Transactions on Computer Systems (TOCS)
Design tradeoffs for software-managed TLBs

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Software-extended coherent shared memory: performance and cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient strategies for software-only protocols in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
MGS: a multigrain shared memory system

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An evaluation of memory consistency models for shared-memory systems with ILP processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Increasing TLB reach using superpages backed by shadow memory

Proceedings of the 25th annual international symposium on Computer architecture
Options for dynamic address translation in COMAs

Proceedings of the 25th annual international symposium on Computer architecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The use of multithreading for exception handling

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors

IEEE Micro
Virtual-Address Caches, Part 2: Multiprocessor Issues

IEEE Micro
Hardware Versus Software Implementation of COMA

ICPP '97 Proceedings of the international Conference on Parallel Processing
Software-Managed Address Translation

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Towards Virtually-Addressed Memory Hierarchies

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

TCP Onloading for Data Center Servers

Computer
A low-cost memory remapping scheme for address bus protection

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A low-cost memory remapping scheme for address bus protection

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	14.99

Visualization

Abstract

Abstract--In the past few years, exception support for memory functions such as virtual memory, informing memory operations, software assist for shared memory protocols, or interactions with processors in memory has been advocated in various research papers. These memory traps may occur on a miss in the cache hierarchy or on a local or remote memory access. However, contemporary, dynamically scheduled processors only support memory exceptions detected in the TLB associated with the first-level cache. They do not support memory exceptions taken deep in the memory hierarchy. In this case, memory traps may be late, in the sense that the exception condition may still be undecided when a long-latency memory instruction reaches the retirement stage. In this paper we evaluate through simulation the overhead of memory traps in dynamically scheduled processors, focusing on the added overhead incurred when a memory trap is late. We also propose some simple mechanisms to reduce this added overhead while preserving the memory consistency model. With more aggressive memory access mechanisms in the processor we observe that the overhead of all memory traps--either early or late--is increased while the lateness of a trap becomes largely tolerated so that the performance gap between early and late memory traps is greatly reduced. Additionally, because of caching effects in the memory hierarchy, the frequency of memory traps usually decreases as they are taken deeper in the memory hierarchy and their overall impact on execution times becomes negligible. We conclude that support for memory traps taken throughout the memory hierarchy could be added to dynamically scheduled processors at low hardware cost and little performance degradation.