Natural Language Processing Based Detection of Duplicate Defect Patterns

Authors:
Qian Wu;Qianxiang Wang
Affiliations:
-;-
Venue:
COMPSACW '10 Proceedings of the 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops
Year:
2010

Citing 0
Cited 1

Mining whining in support forums with frictionary

CHI '12 Extended Abstracts on Human Factors in Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Defect pattern repository collects different kinds of defect patterns, which are general descriptions of the characteristics of commonly occurring software code defects. Defect patterns can be widely used by programmers, static defect analysis tools, and even runtime verification. Following the idea of web 2.0, defect pattern repositories allow these users to submit defect patterns they found. However, submission of duplicate patterns would lead to a redundancy in the repository. This paper introduces an approach to suggest potential duplicates based on natural language processing. Our approach first computes field similarities based on Vector Space Model, and then employs Information Entropy to determine the field importance, and next combines the field similarities to form the final defect pattern similarity. Two strategies are introduced to make our approach adaptive to special situations. Finally, groups of duplicates are obtained by adopting Hierarchical Clustering. Evaluation indicates that our approach could detect most of the actual duplicates (72% in our experiment) in the repository.