Communications of the ACM - Special issue on parallelism
Statistical analysis with missing data
Statistical analysis with missing data
Instance-Based Learning Algorithms
Machine Learning
Data preparation for data mining
Data preparation for data mining
Digital Pattern Recognition
Data Mining: Introductory and Advanced Topics
Data Mining: Introductory and Advanced Topics
Improving Identification of Difficult Small Classes by Balancing Class Distribution
AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Heterogeneous Euclidean-overlap metric and heterogeneous value difference metric given in machine learning literature are useful for the consideration of mixed-type data for machine learning, pattern recognition and data mining tasks. Mixed-type variables are quite common in practical problems, but this property has been taken into account only seldom in pattern recognition, data mining and decision making algorithms. We observed that these two distance measures are not actually metrics after having found a special situation when they are not metric, but pseudometric, a feature to be noted while using them. Nevertheless, by changing their definitions somewhat, it is possible to meet the metricity. Especially in medical applications, the redefinition of the two measures might be important, since otherwise it is possible in theory that, for example, two identical cases would be classified differently. Nearest neighbor searching tests with medical data were run to illustrate the behavior of these measures. Notwithstanding the violation of the metricity their original forms yielded slightly better classification results. The reason was that in real data sets tested there were very few almost similar cases according to these distance measures, and the original forms based on more separating distances than the redefinitions were slightly better in the classification.