Software-controlled caches in the VMP multiprocessor
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Virtual memory primitives for user programs
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tradeoffs in supporting two page sizes
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Design tradeoffs for software-managed TLBs
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPARC architecture manual (version 9)
The SPARC architecture manual (version 9)
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient strategies for software-only protocols in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
MGS: a multigrain shared memory system
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Options for dynamic address translation in COMAs
Proceedings of the 25th annual international symposium on Computer architecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Tolerating late memory traps in ILP processors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The use of multithreading for exception handling
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Hardware Versus Software Implementation of COMA
ICPP '97 Proceedings of the international Conference on Parallel Processing
Software-Managed Address Translation
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
WildFire: A Scalable Path for SMPs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Towards Virtually-Addressed Memory Hierarchies
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A low-cost memory remapping scheme for address bus protection
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A low-cost memory remapping scheme for address bus protection
Journal of Parallel and Distributed Computing
Hi-index | 14.99 |
Abstract--In the past few years, exception support for memory functions such as virtual memory, informing memory operations, software assist for shared memory protocols, or interactions with processors in memory has been advocated in various research papers. These memory traps may occur on a miss in the cache hierarchy or on a local or remote memory access. However, contemporary, dynamically scheduled processors only support memory exceptions detected in the TLB associated with the first-level cache. They do not support memory exceptions taken deep in the memory hierarchy. In this case, memory traps may be late, in the sense that the exception condition may still be undecided when a long-latency memory instruction reaches the retirement stage. In this paper we evaluate through simulation the overhead of memory traps in dynamically scheduled processors, focusing on the added overhead incurred when a memory trap is late. We also propose some simple mechanisms to reduce this added overhead while preserving the memory consistency model. With more aggressive memory access mechanisms in the processor we observe that the overhead of all memory traps--either early or late--is increased while the lateness of a trap becomes largely tolerated so that the performance gap between early and late memory traps is greatly reduced. Additionally, because of caching effects in the memory hierarchy, the frequency of memory traps usually decreases as they are taken deeper in the memory hierarchy and their overall impact on execution times becomes negligible. We conclude that support for memory traps taken throughout the memory hierarchy could be added to dynamically scheduled processors at low hardware cost and little performance degradation.