ReBucket: a method for clustering duplicate crash reports based on call stack similarity

  • Authors:
  • Yingnong Dang;Rongxin Wu;Hongyu Zhang;Dongmei Zhang;Peter Nobel

  • Affiliations:
  • Microsoft Research, China;Tsinghua University, China;Tsinghua University, China;Microsoft Research, China;Microsoft, USA

  • Venue:
  • Proceedings of the 34th International Conference on Software Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software often crashes. Once a crash happens, a crash report could be sent to software developers for investigation upon user permission. To facilitate efficient handling of crashes, crash reports received by Microsoft's Windows Error Reporting (WER) system are organized into a set of "buckets". Each bucket contains duplicate crash reports that are deemed as manifestations of the same bug. The bucket information is important for prioritizing efforts to resolve crashing bugs. To improve the accuracy of bucketing, we propose ReBucket, a method for clustering crash reports based on call stack matching. ReBucket measures the similarities of call stacks in crash reports and then assigns the reports to appropriate buckets based on the similarity values. We evaluate ReBucket using crash data collected from five widely-used Microsoft products. The results show that ReBucket achieves better overall performance than the existing methods. In average, the F-measure obtained by ReBucket is about 0.88.