Load squared: adding logic close to memory to reduce the latency of indirect loads with high miss ratios

  • Authors:
  • Sami Yehia;Jean-Francois Collard;Olivier Temam

  • Affiliations:
  • ARM Ltd, Cambridge, UK;Hewlett-Packard Labs, Palo Alto CA;University of Paris-Sud, France

  • Venue:
  • MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Indirect memory accesses, where a load is fed by another load, are ubiquitous because of rich data structures and sophisticated software conventions, such as the use of linkage tables and position independent code. Unfortunately, they can be costly: if both loads miss, two round trips to memory are required even though the role of the first load is often limited to fetching the address of the second load. To reduce the total latency of such indirect accesses, a new instruction called load squared is introduced. A load squared does two fetches, the first fetch reading the target address of the second. (An offset is optionally added to the result of the first fetch.) The load squared operation is performed by memory-side logic (typically, the memory controller if it isn't located on the main processor chip). In this study, load squared is not an architecturally visible instruction: the micro-architecture transparently decides which loads should be replaced by loads squared. We show that performance is sometimes improved significantly, and never degraded.