The KDD process for extracting useful knowledge from volumes of data
Communications of the ACM
Software Metrics: A Rigorous Approach
Software Metrics: A Rigorous Approach
Data Mining: Introductory and Advanced Topics
Data Mining: Introductory and Advanced Topics
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
Machine Learning
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Knowledge Discovery in Databases: An Attribute-Oriented Approach
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Genetic Programming-Based Decision Trees for Software Quality Classification
ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Detecting noisy instances with the rule-based classification model
Intelligent Data Analysis
Data mining in soft computing framework: a survey
IEEE Transactions on Neural Networks
Accuracy and efficiency comparisons of single- and multi-cycled software classification models
Information and Software Technology
DIAGNOSING CARDIOVASCULAR DISEASE USING AN ENHANCED ROUGH SETS APPROACH
Applied Artificial Intelligence
A hybrid model based on rough sets theory and genetic algorithms for stock price forecasting
Information Sciences: an International Journal
Application of decision tree based on C4.5 in analysis of coal logistics customer
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Hi-index | 12.05 |
Defective software modules cause software failures, increase development and maintenance costs, and reduce customer satisfaction. Effective defect prediction models can help developers focus quality assurance activities on defect-prone modules and thus improve software quality by using resources more efficiently. In real-world databases are highly susceptible to noisy, missing, and inconsistent data. Noise is a random error or variance in a measured variable [Han, J., & Kamber, M. (2001). Data Mining: Concepts and Techniques, San Francisco: Morgan Kaufmann Publishers]. When decision trees are built, many of the branches may reflect noisy or outlier data. Therefore, data preprocessing steps are very important. There are many methods for data preprocessing. Concept hierarchies are a form of data discretization that can use for data preprocessing. Data discretization has many advantages, such as data can be reduced and simplified. Using discrete features are usually more compact, shorter and more accurate than using continuous ones [Liu, H., Hussain, F., Tan, C.L., & Dash, M. (2002). Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4), 393-423]. In this paper, we propose a modified minimize entropy principle approach and develop a modified MEPA system to partition the data, and then build the classification tree model. For verification, two NASA software projects KC2 and JM1 are applied to illustrate our proposed method. We establish a prototype system to discrete data from these projects. The error rate and number of rules show that the proposed approach is both better than other methods.