Improved C4.5 algorithm for rule based classification

Authors:
Mohammed M. Mazid;A. B. M. Shawkat Ali;Kevin S. Tickle
Affiliations:
School of Computing Science, Central Queensland University, Australia;School of Computing Science, Central Queensland University, Australia;School of Computing Science, Central Queensland University, Australia
Venue:
AIKED'10 Proceedings of the 9th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
Year:
2010

Citing 6
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Data mining: concepts and techniques

Data mining: concepts and techniques
Induction of Decision Trees

Machine Learning
A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems

Expert Systems with Applications: An International Journal
On learning algorithm selection for classification

Applied Soft Computing

Predicting Friends and Foes in Signed Networks Using Inductive Inference and Social Balance Theory

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

C4.5 is one of the most popular algorithms for rule base classification. There are many empirical features in this algorithm such as continuous number categorization, missing value handling, etc. However in many cases it takes more processing time and provides less accuracy rate for correctly classified instances. On the other hand, a large dataset might contain hundreds of attributes. We need to choose most related attributes among them to perform higher accuracy using C4.5. It is also a difficult task to choose a proper algorithm to perform efficient and perfect classification. With our proposed method, we select the most relevant attributes from a dataset by reducing input space and simultaneously improve the performance of this algorithm. The improved performance is measured based on better accuracy and less computational complexity. We measure Entropy of Information Theory to identify the central attribute for a dataset. Then apply correlation coefficient measure namely, Pearson's, Spearman, Kendall correlation utilizing the central attribute of the same dataset. We conduct a comparative study using these three most popular correlation coefficient measures to choose the best method on eight well known data mining problem from UCI (University of California Irvine) data repository. We use box plot to compare experimental results. Our proposed method shows better performance in most of the individual experiment.