Learning tree augmented naive bayes for ranking

Authors:
Liangxiao Jiang;Harry Zhang;Zhihua Cai;Jiang Su
Affiliations:
Department of Computer Science, China University of Geosciences, Wuhan, China;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada;Department of Computer Science, China University of Geosciences, Wuhan, China;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
Venue:
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Year:
2005

Citing 10
Cited 4

C4.5: programs for machine learning

C4.5: programs for machine learning
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Toward Bayesian Classifiers with Accurate Probabilities

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Tree Induction for Probability-Based Ranking

Machine Learning
Learning to order things

Journal of Artificial Intelligence Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Induction of selective Bayesian classifiers

UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence

Naive Bayes for optimal ranking

Journal of Experimental & Theoretical Artificial Intelligence
A Combined Classification Algorithm Based on C4.5 and NB

ISICA '08 Proceedings of the 3rd International Symposium on Advances in Computation and Intelligence
Linking Bayesian networks and PLS path modeling for causal analysis

Expert Systems with Applications: An International Journal
One Dependence Value Difference Metric

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Naive Bayes has been widely used in data mining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, among which tree augmented naive Bayes (TAN) [3] achieves a significant improvement in term of classification accuracy, while maintaining efficiency and model simplicity. In many real-world data mining applications, however, an accurate ranking is more desirable than a classification. Thus it is interesting whether TAN also achieves significant improvement in term of ranking, measured by AUC(the area under the Receiver Operating Characteristics curve) [8,1]. Unfortunately, our experiments show that TAN performs even worse than naive Bayes in ranking. Responding to this fact, we present a novel learning algorithm, called forest augmented naive Bayes (FAN), by modifying the traditional TAN learning algorithm. We experimentally test our algorithm on all the 36 data sets recommended by Weka [12], and compare it to naive Bayes, SBC [6], TAN [3], and C4.4 [10], in terms of AUC. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate rankings. Our work provides an effective and efficient data mining algorithm for applications in which an accurate ranking is required.