Reducing Design Complexity of the Load/Store Queue

  • Authors:
  • Il Park;Chong Liang Ooi;T. N. Vijaykumar

  • Affiliations:
  • School of Electrical and Computer Engineering, Purdue University;School of Electrical and Computer Engineering, Purdue University;School of Electrical and Computer Engineering, Purdue University

  • Venue:
  • Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

With faster CPU clocks and wider pipelines, all relevantmicroarchitecture components should scale accordingly.There have been many proposals for scaling the issue queue,register file, and cache hierarchy. However, nothing has beendone for scaling the load/store queue, despite the increasingpressure on the load/store queue in terms of capacity andsearch bandwidth. The load/store queue is a CAM structurewhich holds in-flight memory instructions and supportssimultaneous searches to honor memory dependencies andmemory consistency models. Therefore, it is difficult to scalethe load/store queue.In this study, we introduce novel techniques to scale theload/store queue. We propose two techniques, store-loadpair predictor and load buffer, to reduce the search bandwidthrequirement; and one technique, segmentation, toscale the size. We show that a load/store queue using ourpredictor and load buffer needs only one port to outperforma conventional two-ported load/store queue. Compared tothe same base case, segmentation alone achieves speedupsof 5% for integer benchmarks and 19% for floating pointbenchmarks. A one-ported load/store queue using all of ourtechniques improves performance on average by 6% and23%, and up to 15% and 59%, for integer and floating-pointbenchmarks, respectively, over a two-ported conventionalload/store queue.