OBQA: Smart and cost-efficient queue scheme for Head-of-Line blocking elimination in fat-trees

  • Authors:
  • Jesus Escudero-Sahuquillo;Pedro J. Garcia;Francisco J. Quiles;Jose Flich;Jose Duato

  • Affiliations:
  • Computing Systems Department, Escuela Superior de Ingeniería Informática, Universidad de Castilla-La Mancha, Campus Universitario, s/n 02071, Albacete, Spain;Computing Systems Department, Escuela Superior de Ingeniería Informática, Universidad de Castilla-La Mancha, Campus Universitario, s/n 02071, Albacete, Spain;Computing Systems Department, Escuela Superior de Ingeniería Informática, Universidad de Castilla-La Mancha, Campus Universitario, s/n 02071, Albacete, Spain;Department of Computer Engineering (DISCA), Universitat Politècnica de València, Camino de Vera, s/n 46071, Valencia, Spain;Department of Computer Engineering (DISCA), Universitat Politècnica de València, Camino de Vera, s/n 46071, Valencia, Spain

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-speed interconnection networks are essential elements for different high-performance parallel-computing systems. One of the most common interconnection network topologies is the fat-tree, whose advantages have turned it into the favorite topology of many interconnect designers. One of these advantages is the possibility of using simple but efficient routing algorithms, like the recently proposed deterministic routing algorithm referred to as DET, which offers similar (or better) performance than Adaptive Routing while reducing complexity and guaranteeing in-order packet delivery. However, as other deterministic routing proposals, DET cannot react when packets intensely contend for network resources, leading to the appearance of Head-of-Line (HoL) blocking which spoils network performance. In this paper, we describe and evaluate a simple queue scheme that efficiently reduces HoL-blocking in fat-trees using the DET routing algorithm, without significantly increasing switch complexity and required silicon area. Additionally, we propose an implementation of OBQA in a feasible switch architecture.