Context-Based Similarity Measures for Categorical Databases
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Document Clustering Using Locality Preserving Indexing
IEEE Transactions on Knowledge and Data Engineering
A k-mean clustering algorithm for mixed numeric and categorical data
Data & Knowledge Engineering
A tutorial on spectral clustering
Statistics and Computing
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Active caching for similarity queries based on shared-neighbor information
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Coupled Behavior Analysis with Applications
IEEE Transactions on Knowledge and Data Engineering
CD: a coupled discretization algorithm
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Coupled attribute analysis on numerical data
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
The similarity between nominal objects is not straightforward, especially in unsupervised learning. This paper proposes coupled similarity metrics for nominal objects, which consider not only intra-coupled similarity within an attribute (i.e., value frequency distribution) but also inter-coupled similarity between attributes (i.e. feature dependency aggregation). Four metrics are designed to calculate the inter-coupled similarity between two categorical values by considering their relationships with other attributes. The theoretical analysis reveals their equivalent accuracy and superior efficiency based on intersection against others, in particular for large-scale data. Substantial experiments on extensive UCI data sets verify the theoretical conclusions. In addition, experiments of clustering based on the derived dissimilarity metrics show a significant performance improvement.