Probability Based Metrics for Locally Weighted Naive Bayes

Authors:
Bin Wang;Harry Zhang
Affiliations:
Faculty of Computer Science, University of New Brunswick, P.O. Box 4400, Fredericton, NB,E3B 5A3, Canada;Faculty of Computer Science, University of New Brunswick, P.O. Box 4400, Fredericton, NB,E3B 5A3, Canada
Venue:
CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Year:
2007

Citing 10
Cited 1

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
The multi-class metric problem in nearest neighbour discrimination rules

Pattern Recognition
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Robust space transformations for distance-based operations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Probability Based Metrics for Nearest Neighbor Classification and Case-Based Reasoning

ICCBR '99 Proceedings of the Third International Conference on Case-Based Reasoning and Development
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Locally weighted naive bayes

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

A Modified Short and Fukunaga Metric based on the attribute independence assumption

Pattern Recognition Letters

Quantified Score

Hi-index	0.01

Visualization

Abstract

Locally weighted naive Bayes (LWNB) is a successful instance-based classifier, which first finds the neighbors of the test instance using Euclidean metric, and then builds a naive Bayes model in the local neighborhood. However, Euclidean metric is not the best choice for LWNB. For nominal attributes, Euclidean metric has to order and number the values of attributes, or judge whether the attribute values are identical or not. For numeric attributes, Euclidean metric is not appropriate for different attribute scales and variability, and encounters the problem of attribute value outliers when normalizing values. In this paper, we systematically study probability based metrics, such as Interpolated Value Difference Metric (IVDM), Extended Short and Fukunaga Metric (SF2), SF2 calibrated by logarithm (SF2LOG) and Minimum Risk Metric (MRM), and apply them to LWNB. These probability based metrics can solve the above problems of Euclidean metric since they depend on the difference between the probabilities to evaluate the distances between the instances. We conduct the experiments to compare the performances of LWNB classifiers using Euclidean metric and probability based metrics on UCI datasets. The results show that LWNB classifiers using IVDM outperform the ones using Euclidean metric and other probability based metrics. We also observe that SF2, SF2LOG and MRM do not perform well due to their inaccurate probability estimates. An artificial dataset is built by logical sampling in a Bayesian network, where accurate probability estimates can be produced. We conduct the experiment on the artificial dataset. The results show that SF2, SF2LOG and MRM using accurate probability estimates perform better than Euclidean metric and IVDM in LWNB.