Boolean Feature Discovery in Empirical Learning
Machine Learning
Neural networks: an introduction
Neural networks: an introduction
C4.5: programs for machine learning
C4.5: programs for machine learning
The nature of statistical learning theory
The nature of statistical learning theory
A mathematical theory of communication
ACM SIGMOBILE Mobile Computing and Communications Review
Machine Learning
ECML '93 Proceedings of the European Conference on Machine Learning
Generating Accurate Rule Sets Without Global Optimization
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Text Document Categorization by Term Association
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
On Mining Instance-Centric Classification Rules
IEEE Transactions on Knowledge and Data Engineering
Mining the classification rules: the egyptian rice diseases as case study
TELE-INFO'05 Proceedings of the 4th WSEAS International Conference on Telecommunications and Informatics
Expert Systems with Applications: An International Journal
A Combination Classification Algorithm Based on Outlier Detection and C4.5
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
On learning algorithm selection for classification
Applied Soft Computing
Hi-index | 0.01 |
Rule based classification is one of the most popular way of classification in data mining. There are number of algorithms for rule based classification. C4.5 and Partial Decision Tree (PART) are very popular algorithms among them and both have many empirical features such as continuous number categorization, missing value handling, etc. However in many cases these algorithms takes more processing time and provides less accuracy rate for correctly classified instances. One of the main reasons is high dimensionality of the databases. A large dataset might contain hundreds of attributes with huge instances. We need to choose most related attributes among them to obtain higher accuracy. It is also a difficult task to choose a proper algorithm to perform efficient and perfect classification. With our proposed method, we select the most relevant attributes from a dataset by reducing input space and simultaneously improve the performance of these two rule based algorithms. The improved performance is measured based on better accuracy and less computational complexity. We measure Entropy of Information Theory to identify the central attribute for a dataset. Then apply correlation coefficient measure namely, Pearson's, Spearman and Kendall correlation utilizing the central attribute of the same dataset. We have conducted a comparative study using these three most popular correlation coefficient measures to choose the best method. We have picked datasets from well known data repository UCI (University of California Irvine) database. We have used box plot to compare experimental results. Our proposed method has showed better performance in most of the individual experiment.