Evaluation of the device driver availability in dawning4000a

  • Authors:
  • Yuanxia You;Dan Meng;Gang Xue;Jie Ma

  • Affiliations:
  • National Research Center for Intelligent Computer Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P.R. China;National Research Center for Intelligent Computer Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P.R. China;Shanghai Supercomputer Center, Shanghai, P.R. China;National Research Center for Intelligent Computer Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P.R. China

  • Venue:
  • GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Device drivers were claimed to be the most error prone in kernel source. A lot of error tolerance or error prevention approaches have been developed or suggested after this claim. But after analyzing the event log and maintenance record of Dawning4000A for three month, we find that device driver errors are not the most crucial crash causes in this previous TOP10 supercomputer. We believe device driver errors need developing and debugging efforts, rather than tolerance. We also suggest drivers to achieve better tolerance to device errors, especially on storage device.