Analyzing software effort estimation using k means clustered regression approach

  • Authors:
  • Geeta Nagpal;Moin Uddin;Arvinder Kaur

  • Affiliations:
  • National Institute of Technology, Jalandhar;Delhi Technological University, Delhi;University School of Information Technology, Indraprastha University, Delhi

  • Venue:
  • ACM SIGSOFT Software Engineering Notes
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software estimation is an area where more assurances have been broken than in any other area of software development. Numerous studies attempting new and reliable software effort estimation techniques have been proposed but no consensus as to which techniques are the most appropriate has been reached so far. Due to the intangible nature of "software", effort estimation with a high level of accuracy remains a dream for developers. It is unlikely to expect very accurate estimates of development effort because of the inherent uncertainty in software projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in software engineering datasets because data is obtained from diverse sources. This can be reduced by defining certain relationships between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and regression techniques can reduce the potential problem in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques. Another key finding is that by selecting a subset of highly predictive attributes using Grey relational analysis a significant improvement in prediction can be achieved.