Exploiting execution locality with a decoupled Kilo-instruction processor

  • Authors:
  • Miquel Pericàs;Adrian Cristal;Ruben González;Daniel A. Jiménez;Mateo Valero

  • Affiliations:
  • Computer Architecture Department, Technical University of Catalonia, Barcelona, Spain and Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain;Computer Architecture Department, Technical University of Catalonia, Barcelona, Spain;Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX;Computer Architecture Department, Technical University of Catalonia, Barcelona, Spain and Barcelona Supercomputing Center, Barcelona, Spain

  • Venue:
  • ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Overcoming increasing memory latency is one of the main problems that microprocessor designers have faced over the years. The two basic techniques introduced to mitigate latencies are caches and out-of-order execution. However, neither of these solutions is adequatefor hiding off-chip memory accesses in the order of 200 cycles or more. Theoretically, increasing the size of the instruction window would allow much longer latencies to be hidden. But scaling the structures to support thousands of in-flight instructions would be prohibitively expensive. However, the distribution of instruction issue times under the presence of L2 cache misses is highly correlated. This paper describes this phenomenon of Execution Locality and shows how it can be exploited with an inexpensive microarchitecture consisting of two linked cores. This Decoupled Kilo-Instruction Processor (D-KIP) is very effective in recovering lost potential performance. Extensive simulations show that speedups of up to 379% are possible for numerical benchmarks thanks to the exploitation of impressive degrees of Memory-Level Parallelism (MLP) and the execution of independent instructions in the shadow of L2 misses.