The migration prefetcher: Anticipating data promotion in dynamic NUCA caches

  • Authors:
  • Javier Lira;Timothy M. Jones;Carlos Molina;Antonio González

  • Affiliations:
  • Universitat Politècnica de Catalunya, Spain;University of Cambridge, UK;Universitat Rovira i Virgili, Spain;Intel Barcelona Research Center, Intel Labs - UPC, Spain

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The exponential increase in multicore processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with a single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this problem. A NUCA divides the whole cache memory into smaller banks and allows banks nearer a processor core to have lower access latencies than those further away, thus mitigating the effects of the cache's internal wires. Determining the best placement for data in the NUCA cache at any particular moment during program execution is crucial for exploiting the benefits that this architecture provides. Dynamic NUCA (D-NUCA) allows data to be mapped to multiple banks within the NUCA cache, and then uses data migration to adapt data placement to the program's behavior. Although the standard migration scheme is effective in moving data to its optimal position within the cache, half the hits still occur within non-optimal banks. This paper reduces this number by anticipating data migrations and moving data to the optimal banks in advance of being required. We introduce a prefetcher component to the NUCA cache that predicts the next memory request based on the past. We develop a realistic implementation of this prefetcher and, furthermore, experiment with a perfect prefetcher that always knows where the data resides, in order to evaluate the limits of this approach. We show that using our realistic data prefetching to anticipate data migrations in the NUCA cache can reduce the access latency by 15% on average and achieve performance improvements of up to 17%.