Ensemble missing data techniques for software effort prediction

Authors:
Bhekisipho Twala;Michelle Cartwright
Affiliations:
(Correspd. Tel.: +27 11 559 4404/ Fax: +27 11 559 2357/ E-mail: btwala@uj.ac.za) Department of Electrical and Electronic Engineering Science, University of Johannesburg, P.O. Box 524, Auckland Par ...;Brunel Software Engineering Research Centre, School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge, UK
Venue:
Intelligent Data Analysis
Year:
2010

Citing 39
Cited 2

Statistical analysis with missing data

Statistical analysis with missing data
Structured induction in expert systems

Structured induction in expert systems
Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis

IEEE Transactions on Software Engineering - Special Issue on Artificial Intelligence in Software Applications
Evaluating techniques for generating metric-based classification trees

Journal of Systems and Software - An Oregon workshop on software metrics
Designing Storage Efficient Decision Trees

IEEE Transactions on Computers
A Pattern Recognition Approach for Software Engineering Data Analysis

IEEE Transactions on Software Engineering - Special issue on software measurement principles, techniques, and environments
C4.5: programs for machine learning

C4.5: programs for machine learning
Technical Note: Bias in Information-Based Measures in Decision Tree Induction

Machine Learning
Machine Learning Approaches to Estimating Software Development Effort

IEEE Transactions on Software Engineering
Bagging predictors

Machine Learning
Estimating Software Project Effort Using Analogies

IEEE Transactions on Software Engineering
A Critique of Software Defect Prediction Models

IEEE Transactions on Software Engineering
Validating the ISO/IEC 15504 measures of software development process capability

Journal of Systems and Software
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Software Cost Estimation with Incomplete Data

IEEE Transactions on Software Engineering
Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Imputation of Missing Data in Industrial Databases

Applied Intelligence
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Empirically Guided Software Development Using Metric-Based Classification Trees

IEEE Software
Analyzing and Improving Reliability: A Tree-Based Approach

IEEE Software
Integrating Time Domain and Input Domain Analyses of Software Reliability Using Tree-Based Models

IEEE Transactions on Software Engineering
An Enhanced Neural Network Technique for Software Risk Analysis

IEEE Transactions on Software Engineering
Induction of Decision Trees

Machine Learning
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Issues on the Effective Use of CBR Technology for Software Project Prediction

ICCBR '01 Proceedings of the 4th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Tree-Based Software Quality Estimation Models For Fault Prediction

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Random decision forests

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Induction over large data bases

Induction over large data bases
Dealing with Missing Software Project Data

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Empirical Software Engineering
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Short Note on Safest Default Missingness Mechanism Assumptions

Empirical Software Engineering
Resource-oriented software quality classification models

Journal of Systems and Software
Ensemble Imputation Methods for Missing Software Engineering Data

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
Good methods for coping with missing data in decision trees

Pattern Recognition Letters
AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES

Applied Artificial Intelligence
Lookahead and pathology in decision tree induction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Credit risk analysis using a reliability-based neural network ensemble model

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II

An industrial case study of classifier ensembles for locating software defects

Software Quality Control
Partial imputation of unseen records to improve classification using a hybrid multi-layered artificial immune system and genetic algorithm

Applied Soft Computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Constructing an accurate effort prediction model is a challenge in software engineering. The development and validation of models that are used for prediction tasks require good quality data. Unfortunately, software engineering datasets tend to suffer from the incompleteness which could result to inaccurate decision making and project management and implementation. Recently, the use of machine learning algorithms has proven to be of great practical value in solving a variety of software engineering problems including software prediction, including the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper proposes a method for improving software effort prediction accuracy produced by a decision tree learning algorithm and by generating the ensemble using two imputation methods as elements. Benchmarking results on ten industrial datasets show that the proposed ensemble strategy has the potential to improve prediction accuracy compared to an individual imputation method, especially if multiple imputation is a component of the ensemble.