Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis
IEEE Transactions on Software Engineering - Special Issue on Artificial Intelligence in Software Applications
Methodology for Validating Software Metrics
IEEE Transactions on Software Engineering
A Pattern Recognition Approach for Software Engineering Data Analysis
IEEE Transactions on Software Engineering - Special issue on software measurement principles, techniques, and environments
Improving Software Maintenance at Martin Marietta
IEEE Software
Selected papers of the sixth annual Oregon workshop on Software metrics
System acquisition based on software product assessment
Proceedings of the 18th international conference on Software engineering
A Validation of Object-Oriented Design Metrics as Quality Indicators
IEEE Transactions on Software Engineering
A Procedure for Analyzing Unbalanced Datasets
IEEE Transactions on Software Engineering
Which software modules have faults which will be discovered by customers?
Journal of Software Maintenance: Research and Practice
Classification of Fault-Prone Software Modules: Prior Probabilities,Costs, and Model Evaluation
Empirical Software Engineering
Balancing Misclassification Rates in Classification-TreeModels of Software Quality
Empirical Software Engineering
Emerald: Software Metrics and Models on the Desktop
IEEE Software
Machine Learning
Using Classification Trees for Software Quality Models: Lessons Learned
HASE '98 The 3rd IEEE International Symposium on High-Assurance Systems Engineering
Application of a Usage Profile in Software Quality Models
CSMR '99 Proceedings of the Third European Conference on Software Maintenance and Reengineering
A tree-based classification model for analysis of a military software system
HASE '96 Proceedings of the 1996 High-Assurance Systems Engineering Workshop
An Integrated Process and Product Model
METRICS '98 Proceedings of the 5th International Symposium on Software Metrics
Assessing Uncertain Predictions of Software Quality
METRICS '99 Proceedings of the 6th International Symposium on Software Metrics
Preparing Measurements of Legacy Software for Predicting Operational Faults
ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Building Software Quality Classification Trees: Approach, Experimentation, Evaluation
ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Classification Tree Models of Software Quality Over Multiple Releases
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Uncertain Classification of Fault-Prone Software Modules
Empirical Software Engineering
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study
Empirical Software Engineering
Enhancing software quality estimation using ensemble-classifier based noise filtering
Intelligent Data Analysis
Training on errors experiment to detect fault-prone software modules by spam filter
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Accuracy and efficiency comparisons of single- and multi-cycled software classification models
Information and Software Technology
Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator
IEICE - Transactions on Information and Systems
Feature selection and clustering in software quality prediction
EASE'07 Proceedings of the 11th international conference on Evaluation and Assessment in Software Engineering
Hi-index | 0.00 |
Predictingwhich modules are likely to have faults during operations isimportant to software developers, so that software enhancementefforts can be focused on those modules that need improvementthe most. Modeling software quality with classification treesis attractive because they readily model nonmonotonic relationships.In this paper, we apply the TREEDISCalgorithm which is a refinement of the CHAID algorithmto build classification-tree models. Chaid-based algorithmsdiffer from other classification-tree algorithms in their relianceon chi-squared tests when building the tree. Classification-treemodels are vulnerable to overfitting, where the model reflectsthe structure of the training data set too closely. Even thougha model appears to be accurate on training data, if overfitted,it may be much less accurate when applied to a current data set.To account for the severe consequences of misclassifying fault-pronemodules, our measure of overfitting is based on expected costsof misclassification, rather than the total number of misclassifications.We conducted a case study of a very large telecommunicationssystem. A two-way analysis of variance with repetitions foundthat TREEDISC's significance level was highly relatedto overfitting, and can be used to control it. Moreover, theminimum number of modules in a leaf also influenced the degreeof overfitting.