ReBucket: a method for clustering duplicate crash reports based on call stack similarity

Authors:
Yingnong Dang;Rongxin Wu;Hongyu Zhang;Dongmei Zhang;Peter Nobel
Affiliations:
Microsoft Research, China;Tsinghua University, China;Tsinghua University, China;Microsoft Research, China;Microsoft, USA
Venue:
Proceedings of the 34th International Conference on Software Engineering
Year:
2012

Citing 16
Cited 4

Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to algorithms

Introduction to algorithms
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Quickly Finding Known Software Problems via Automated Symptom Matching

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Failure proximity: a fault localization-based approach

Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Windows XP kernel crash analysis

LISA '06 Proceedings of the 20th conference on Large Installation System Administration
An approach to detecting duplicate bug reports using natural language and execution information

Proceedings of the 30th international conference on Software engineering
ReCrash: Making Software Failures Reproducible by Preserving Object States

ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Automatically Identifying Known Software Problems

ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
HOLMES: Effective statistical debugging via efficient path profiling

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
A comparison of extrinsic clustering evaluation metrics based on formal constraints

Information Retrieval
DebugAdvisor: a recommender system for debugging

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Debugging in the (very) large: ten years of implementation and experience

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Finding similar failures using callstack similarity

SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Which Crashes Should I Fix First?: Predicting Top Crashes at an Early Stage to Prioritize Debugging Efforts

IEEE Transactions on Software Engineering
Crash graphs: An aggregated view of multiple crashes to improve crash triage

DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks

Software analytics in practice: mini tutorial

Proceedings of the 34th International Conference on Software Engineering
Predicting recurring crash stacks

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Griffin: grouping suspicious memory-access patterns to improve understanding of concurrency bugs

Proceedings of the 2013 International Symposium on Software Testing and Analysis
Fault comprehension for concurrent programs

Proceedings of the 2013 International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software often crashes. Once a crash happens, a crash report could be sent to software developers for investigation upon user permission. To facilitate efficient handling of crashes, crash reports received by Microsoft's Windows Error Reporting (WER) system are organized into a set of "buckets". Each bucket contains duplicate crash reports that are deemed as manifestations of the same bug. The bucket information is important for prioritizing efforts to resolve crashing bugs. To improve the accuracy of bucketing, we propose ReBucket, a method for clustering crash reports based on call stack matching. ReBucket measures the similarities of call stacks in crash reports and then assigns the reports to appropriate buckets based on the similarity values. We evaluate ReBucket using crash data collected from five widely-used Microsoft products. The results show that ReBucket achieves better overall performance than the existing methods. In average, the F-measure obtained by ReBucket is about 0.88.