Communications of the ACM - Special issue on parallelism
Generalization of the Mahalanobis distance in the mixed case
Journal of Multivariate Analysis
Unifying instance-based and rule-based induction
Machine Learning
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Unsupervised Learning with Mixed Numeric and Nominal Data
IEEE Transactions on Knowledge and Data Engineering
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
A measure of variance for hierarchical nominal attributes
Information Sciences: an International Journal
WMCA: a weighted matrix coverage based approach to cluster multivariate time series
ICNC'09 Proceedings of the 5th international conference on Natural computation
BRACID: a comprehensive approach to learning rules from imbalanced data
Journal of Intelligent Information Systems
Hi-index | 0.10 |
In this paper, we compare three different measures for computing Mahalanobis-type distances between random variables consisting of several categorical dimensions or mixed categorical and numeric dimensions - regular simplex, tensor product space, and symbolic covariance. The tensor product space and symbolic covariance distances are new contributions. We test the methods on two application domains - classification and principal components analysis. We find that the tensor product space distance is impractical with most problems. Over all, the regular simplex method is the most successful in both domains, but the symbolic covariance method has several advantages including time and space efficiency, applicability to different contexts, and theoretical neatness.