The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops)

  • Authors:
  • Dan Tsafrir

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • ecs'07 Experimental computer science on Experimental computer science
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The overhead of a context switch is typically associated with multitasking, where several applications share a processor. But even if only one runnable application is present in the system and supposedly runs alone, it is still repeatedly preempted in favor of a different thread of execution, namely, the operating system that services periodic clock interrupts. We employ two complementing methodologies to measure the overhead incurred by such events and obtain contradictory results. The first methodology systematically changes the interrupt frequency and measures by how much this prolongs the duration of a program that sorts an array. The over-all overhead is found to be 0.5-1.5% at 1000 Hz, linearly proportional to the tick rate, and steadily declining as the speed of processors increases. If the kernel is configured such that each tick is slowed down by an access to an external time source, then the direct overhead dominates. Otherwise, the relative weight of the indirect portion is steadily growing with processors' speed, accounting for up to 85% of the total. The second methodology repeatedly executes a simplistic loop (calibrated to take 1ms), measures the actual execution time, and analyzes the perturbations. Some loop implementations yield results similar to the above, but others indicate that the overhead is actually an order of magnitude bigger, or worse. The phenomenon was observed on IA32, IA64, and Power processors, the latter being part of the ASC Purple supercomputer. Indeed, the effect is dramatically amplified for parallel jobs, where one late thread holds up all its peers, causing a slowdown that is dominated by the per-node latency (numerator) and the job granularity (denominator). We show the effect is due to an unexplained interrupt/loop interaction; the question of whether this hardware misfeature is experienced by real applications remains open.