A Study of the Neighborhood Counting Similarity

Authors:
Hui Wang;Fionn Murtagh
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2008

Citing 0
Cited 1

Lattice Machine Classification based on Contextual Probability

Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel similarity, neighborhood counting measure, was recently proposed which counts the number of neighborhoods of a pair of data points. This similarity can handle numerical and categorical attributes in a conceptually uniform way, can be calculated efficiently through a simple formula, and gives good performance when tested in the framework of k-nearest neighbor classifier. In particular it consistently outperforms a combination of the classical Euclidean distance and Hamming distance. This measure was also shown to be related to a probability formalism, G probability, which is induced from a target probability function P. It was however unclear how G is related to P, especially for classification. Therefore it was not possible to explain some characteristic features of the neighborhood counting measure. In this paper we show that G is a linear function of P, and G-based Bayes classification is equivalent to P-based Bayes classification. We also show that the k-nearest neighbor classifier, when weighted by the neighborhood counting measure, is in fact an approximation of the G-based Bayes classifier, and furthermore, the P-based Bayes classifier. Additionally we show that the neighborhood counting measure remains unchanged when binary attributes are treated as categorical or numerical data. This is a feature that is not shared by other distance measures, to the best of our knowledge. This study provides a theoretical insight into the neighborhood counting measure.