Improved Similarity Measures for Software Clustering

  • Authors:
  • Rashid Naseem;Onaiza Maqbool;Siraj Muhammad

  • Affiliations:
  • -;-;-

  • Venue:
  • CSMR '11 Proceedings of the 2011 15th European Conference on Software Maintenance and Reengineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software clustering is a useful technique to recover architecture of a software system. The results of clustering depend upon choice of entities, features, similarity measures and clustering algorithms. Different similarity measures have been used for determining similarity between entities during the clustering process. In software architecture recovery domain the Jaccard and the Unbiased Ellenberg measures have shown better results than other measures for binary and non-binary features respectively. In this paper we analyze the Russell and Rao measure for binary features to show the conditions under which its performance is expected to be better than that of Jaccard. We also show how our proposed Jaccard-NM measure is suitable for software clustering and propose its counterpart for non-binary features. Experimental results indicate that our proposed Jaccard-NM measure and Russell & Rao measure perform better than Jaccard measure for binary features, while for non-binary features, the proposed Unbiased Ellenberg-NM measure produces results which are closer to the decomposition prepared by experts.