C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning - Special issue on learning with probabilistic representations
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Toward Bayesian Classifiers with Accurate Probabilities
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Tree Induction for Probability-Based Ranking
Machine Learning
Journal of Artificial Intelligence Research
Induction of selective Bayesian classifiers
UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence
Naive Bayes for optimal ranking
Journal of Experimental & Theoretical Artificial Intelligence
A Combined Classification Algorithm Based on C4.5 and NB
ISICA '08 Proceedings of the 3rd International Symposium on Advances in Computation and Intelligence
Linking Bayesian networks and PLS path modeling for causal analysis
Expert Systems with Applications: An International Journal
One Dependence Value Difference Metric
Knowledge-Based Systems
Hi-index | 0.00 |
Naive Bayes has been widely used in data mining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, among which tree augmented naive Bayes (TAN) [3] achieves a significant improvement in term of classification accuracy, while maintaining efficiency and model simplicity. In many real-world data mining applications, however, an accurate ranking is more desirable than a classification. Thus it is interesting whether TAN also achieves significant improvement in term of ranking, measured by AUC(the area under the Receiver Operating Characteristics curve) [8,1]. Unfortunately, our experiments show that TAN performs even worse than naive Bayes in ranking. Responding to this fact, we present a novel learning algorithm, called forest augmented naive Bayes (FAN), by modifying the traditional TAN learning algorithm. We experimentally test our algorithm on all the 36 data sets recommended by Weka [12], and compare it to naive Bayes, SBC [6], TAN [3], and C4.4 [10], in terms of AUC. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate rankings. Our work provides an effective and efficient data mining algorithm for applications in which an accurate ranking is required.