Input space reduction for rule based classification

Authors:
Mohammed M. Mazid;A. B. M. Shawkat Ali;Kevin S. Tickle
Affiliations:
School of Computing Science, Central Queensland University, Australia;School of Computing Science, Central Queensland University, Australia;School of Computing Science, Central Queensland University, Australia
Venue:
WSEAS Transactions on Information Science and Applications
Year:
2010

Citing 17
Cited 0

Boolean Feature Discovery in Empirical Learning

Machine Learning
Neural networks: an introduction

Neural networks: an introduction
C4.5: programs for machine learning

C4.5: programs for machine learning
The nature of statistical learning theory

The nature of statistical learning theory
Data mining: practical machine learning tools and techniques with Java implementations

ACM SIGMOD Record
A mathematical theory of communication

ACM SIGMOBILE Mobile Computing and Communications Review
Induction of Decision Trees

Machine Learning
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Text Document Categorization by Term Association

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
On Mining Instance-Centric Classification Rules

IEEE Transactions on Knowledge and Data Engineering
Mining the classification rules: the egyptian rice diseases as case study

TELE-INFO'05 Proceedings of the 4th WSEAS International Conference on Telecommunications and Informatics
A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems

Expert Systems with Applications: An International Journal
A Combination Classification Algorithm Based on Outlier Detection and C4.5

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
On learning algorithm selection for classification

Applied Soft Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Rule based classification is one of the most popular way of classification in data mining. There are number of algorithms for rule based classification. C4.5 and Partial Decision Tree (PART) are very popular algorithms among them and both have many empirical features such as continuous number categorization, missing value handling, etc. However in many cases these algorithms takes more processing time and provides less accuracy rate for correctly classified instances. One of the main reasons is high dimensionality of the databases. A large dataset might contain hundreds of attributes with huge instances. We need to choose most related attributes among them to obtain higher accuracy. It is also a difficult task to choose a proper algorithm to perform efficient and perfect classification. With our proposed method, we select the most relevant attributes from a dataset by reducing input space and simultaneously improve the performance of these two rule based algorithms. The improved performance is measured based on better accuracy and less computational complexity. We measure Entropy of Information Theory to identify the central attribute for a dataset. Then apply correlation coefficient measure namely, Pearson's, Spearman and Kendall correlation utilizing the central attribute of the same dataset. We have conducted a comparative study using these three most popular correlation coefficient measures to choose the best method. We have picked datasets from well known data repository UCI (University of California Irvine) database. We have used box plot to compare experimental results. Our proposed method has showed better performance in most of the individual experiment.