Coupled nominal similarity in unsupervised learning

Authors:
Can Wang;Longbing Cao;Mingchun Wang;Jinjiu Li;Wei Wei;Yuming Ou
Affiliations:
University of Technology, Sydney, Sydney, Australia;University of Technology, Sydney, Sydney, Australia;Tianjin University of Technology and Education, Tianjin, China;University of Technology, Sydney, Sydney, Australia;University of Technology, Sydney, Sydney, Australia;University of Technology, Sydney, Sydney, Australia
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 9
Cited 2

A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Context-Based Similarity Measures for Categorical Databases

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Document Clustering Using Locality Preserving Indexing

IEEE Transactions on Knowledge and Data Engineering
A k-mean clustering algorithm for mixed numeric and categorical data

Data & Knowledge Engineering
A tutorial on spectral clustering

Statistics and Computing
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)

Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Active caching for similarity queries based on shared-neighbor information

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Coupled Behavior Analysis with Applications

IEEE Transactions on Knowledge and Data Engineering

CD: a coupled discretization algorithm

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Coupled attribute analysis on numerical data

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The similarity between nominal objects is not straightforward, especially in unsupervised learning. This paper proposes coupled similarity metrics for nominal objects, which consider not only intra-coupled similarity within an attribute (i.e., value frequency distribution) but also inter-coupled similarity between attributes (i.e. feature dependency aggregation). Four metrics are designed to calculate the inter-coupled similarity between two categorical values by considering their relationships with other attributes. The theoretical analysis reveals their equivalent accuracy and superior efficiency based on intersection against others, in particular for large-scale data. Substantial experiments on extensive UCI data sets verify the theoretical conclusions. In addition, experiments of clustering based on the derived dissimilarity metrics show a significant performance improvement.