Improving applications performance: a memory model and cache architecture

  • Authors:
  • D. N. Jutla;P. Bodorik

  • Affiliations:
  • Faculty of Computer Science, DalTech, Dalhousie University, Halifax, Nova Scotia, Canada;Faculty of Computer Science, DalTech, Dalhousie University, Halifax, Nova Scotia, Canada

  • Venue:
  • ACM SIGARCH Computer Architecture News
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a memory model and architecture for synchronization of threads or tasks when accessing regions of virtual memory. Access control is defined on a memory region through a view that defines the size of access units and also the protocol in terms of a Finite State Machine (FSM). The size of access units, although fixed within a view, can vary across views and is thus customizable to applications. Variable-sized access units are obtained without altering the underlying fixed sized paging implementation. By defining access control through FSM definitions, any protocol that is decomposable into a set of states and can be expressed using an FSM can be supported.A cache-based architecture of the Protection Control Unit (PCU), which decides whether a read/write memory access to an access unit of a region/view can proceed or leads to a fault, is presented. The PCU is not invoked on each read/write memory access, but only on data cache misses and write access faults. The cache-based architecture forms flexible hardware support for access control in that by loading the caches with definitions of different FSM, different protocols can be supported. Furthermore, by providing for hardware supported changes in state of access through state transitions, frequency of context switching is reduced.Trace-driven simulation is used to examine the delay in the memory hierarchy due to inclusion of the caches in the PCU unit, and to examine delay in the memory hierarchy for a conventional software implementation of the same access control protocol. A TPC-C benchmark application under different transaction loads was traced and the results show that it is the number of TLB accesses (approximately 15 times more as compared to PCU accesses) for the modeled application that incurs the dominant delay, as compared to delay in the PCU memory hierarchy.