Handling categorical variables in effort estimation

Authors:
Masateru Tsunoda;Sousuke Amasaki;Akito Monden
Affiliations:
Toyo University, Kawagoe, Japan;Okayama Prefectural University, Soja, Japan;Nara Institute of Science and Technology, Ikoma, Japan
Venue:
Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Year:
2012

Citing 8
Cited 1

Software engineering metrics and models

Software engineering metrics and models
Robust regression for developing software estimation models

Journal of Systems and Software
A Procedure for Assessing the Influence of Problem Domain on Effort Estimation Consistency

Software Quality Control
Validation methods for calibrating software effort models

Proceedings of the 27th international conference on Software engineering
Cross-company and single-company effort models using the ISBSG database: a further replicated study

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Selecting Best Practices for Effort Estimation

IEEE Transactions on Software Engineering
Conceptual data model-based software size estimation for information systems

ACM Transactions on Software Engineering and Methodology (TOSEM)
Factors leading to integration failures in global feature-oriented development: an empirical analysis

Proceedings of the 33rd International Conference on Software Engineering

How to treat timing information for software effort estimation?

Proceedings of the 2013 International Conference on Software and System Process

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: Accurate effort estimation is the basis of the software development project management. The linear regression model is one of the widely-used methods for the purpose. A dataset used to build a model often includes categorical variables denoting such as programming languages. Categorical variables are usually handled with two methods: the stratification and dummy variables. Those methods have a positive effect on accuracy but have shortcomings. The other handing method, the interaction and the hierarchical linear model (HLM), might be able to compensate for them. However, the two methods have not been examined in the research area. Aim: giving useful suggestions for handling categorical variables with the stratification, transforming dummy variables, the interaction, or HLM, when building an estimation model. Method: We built estimation models with the four handling methods on ISBSG, NASA, and Desharnais datasets, and compared accuracy of the methods with each other. Results: The most effective method was different for datasets, and the difference was statistically significant on both mean balanced relative error (MBRE) and mean magnitude of relative error (MMRE). The interaction and HLM were effective in a certain case. Conclusions: The stratification and transforming dummy variables should be tried at least, for obtaining an accurate model. In addition, we suggest that the application of the interaction and HLM should be considered when building the estimation model.