Using Semi-supervised Clustering to Improve Regression Test Selection Techniques

  • Authors:
  • Songyu Chen;Zhenyu Chen;Zhihong Zhao;Baowen Xu;Yang Feng

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICST '11 Proceedings of the 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster test selection is proposed as an efficient regression testing approach. It uses some distance measures and clustering algorithms to group tests into some clusters. Tests in a same cluster are considered to have similar behaviors. A certain sampling strategy for the clustering result is used to build up a small subset of tests, which is expected to approximate the fault detection capability of the original test set. All existing cluster test selection methods employ unsupervised clustering. The previous test results are not used in the process of clustering. It may lead to unsatisfactory clustering results in some cases. In this paper, a semi-supervised clustering method, namely semi-supervised K-means (SSKM), is introduced to improve cluster test selection. SSKM uses limited supervision in the form of pair wise constraints: Must-link and Cannot-link. These pair wise constraints are derived from previous test results to improve clustering results as well as test selection results. The experiment results illustrate the effectiveness of cluster test selection methods with SSKM. Two useful observations are made by analysis. (1) Cluster test selection with SSKM has a better effectiveness when the failed tests are in a medium proportion. (2) A strict definition of pair wise constraint can improve the effectiveness of cluster test selection with SSKM.