Statistical analysis with missing data
Statistical analysis with missing data
An empirical validation of software cost estimation models
Communications of the ACM
Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis
IEEE Transactions on Software Engineering - Special Issue on Artificial Intelligence in Software Applications
A Pattern Recognition Approach for Software Engineering Data Analysis
IEEE Transactions on Software Engineering - Special issue on software measurement principles, techniques, and environments
Empirical studies of assumptions that underlie software cost-estimation models
Information and Software Technology
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning Approaches to Estimating Software Development Effort
IEEE Transactions on Software Engineering
Feature Selection: Evaluation, Application, and Small Sample Performance
IEEE Transactions on Pattern Analysis and Machine Intelligence
Enhancements to the data mining process
Enhancements to the data mining process
Estimating Software Project Effort Using Analogies
IEEE Transactions on Software Engineering
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Explaining the cost of European space and military projects
Proceedings of the 21st international conference on Software engineering
An assessment and comparison of common software cost estimation modeling techniques
Proceedings of the 21st international conference on Software engineering
A replicated assessment and comparison of common software cost modeling techniques
Proceedings of the 22nd international conference on Software engineering
Software Cost Estimation with Incomplete Data
IEEE Transactions on Software Engineering
Modeling Development Effort in Object-Oriented Systems Using Design Properties
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Software Engineering Economics
Software Engineering Economics
Improving Subjective Estimates Using Paired Comparisons
IEEE Software
A Modified Chi2 Algorithm for Discretization
IEEE Transactions on Knowledge and Data Engineering
Machine Learning
Machine Learning
Discovering Patterns in EEG-Signals: Comparative Study of a Few Methods
ECML '93 Proceedings of the European Conference on Machine Learning
Handling Missing Data in Trees: Surrogate Splits or Statistical Imputation
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Quantitative Empirical Modeling for Manageing Software Development: Constraints, Needs and Solutions
Proceedings of the International Workshop on Experimental Software Engineering Issues: Critical Assessment and Future Directions
Using Public Domain Metrics To Estimate Software Development Effort
METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Building A Software Cost Estimation Model Based On Categorical Data
METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Dealing with Missing Software Project Data
METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
A Short Note on Safest Default Missingness Mechanism Assumptions
Empirical Software Engineering
Using Multivariate Statistics (5th Edition)
Using Multivariate Statistics (5th Edition)
A new imputation method for small software project data sets
Journal of Systems and Software
IEEE Transactions on Software Engineering
A method of programming measurement and estimation
IBM Systems Journal
Application of decision tree based on C4.5 in analysis of coal logistics customer
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Handling missing data in software effort prediction with naive Bayes and EM algorithm
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Learning in rough-neuro-fuzzy system for data with missing values
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Hi-index | 0.02 |
Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%.