Hierarchical clustering of mixed data based on distance hierarchy

Authors:
Chung-Chian Hsu;Chin-Long Chen;Yu-Wei Su
Affiliations:
Department of Information Management, National Yunlin University of Science and Technology, Douliu, Yunlin 640, Taiwan;Department of Information Management, National Yunlin University of Science and Technology, Douliu, Yunlin 640, Taiwan;Department of Information Management, National Yunlin University of Science and Technology, Douliu, Yunlin 640, Taiwan
Venue:
Information Sciences: an International Journal
Year:
2007

Citing 26
Cited 18

Algorithms for clustering data

Algorithms for clustering data
Data mining and knowledge discovery in databases

Communications of the ACM
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
A robust and scalable clustering algorithm for mixed type attributes in large database environment

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A discrete-valued clustering algorithm with applications to biomolecular data

Information Sciences: an International Journal
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Data Mining: Concepts, Models, Methods and Algorithms

Data Mining: Concepts, Models, Methods and Algorithms
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Squeezer: an efficient algorithm for clustering categorical data

Journal of Computer Science and Technology
Unsupervised Learning with Mixed Numeric and Nominal Data

IEEE Transactions on Knowledge and Data Engineering
Data-Driven Discovery of Quantitative Rules in Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences

Information Sciences: an International Journal - Special issue: Soft computing data mining
Fuzzy clustering of categorical data using fuzzy centroids

Pattern Recognition Letters
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Temporal analysis of clusters of supermarket customers: conventional versus interval set approach

Information Sciences—Informatics and Computer Science: An International Journal
Looking into the seeds of time: Discovering temporal patterns in large transaction sets

Information Sciences: an International Journal
Electricity based external similarity of categorical attributes

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
DECA: A Discrete-Valued Data Clustering Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modified adaptive resonance theory network for mixed data based on distance hierarchy

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
A fuzzy k-modes algorithm for clustering categorical data

IEEE Transactions on Fuzzy Systems
Generalizing self-organizing map for categorical data

IEEE Transactions on Neural Networks

Mining typical patterns from databases

Information Sciences: an International Journal
EED: Energy Efficient Disk drive architecture

Information Sciences: an International Journal
Parametric calibration of speed-density relationships in mesoscopic traffic simulator with data mining

Information Sciences: an International Journal
Exploiting the performance gains of modern disk drives by enhancing data locality

Information Sciences: an International Journal
A new point symmetry based fuzzy genetic clustering technique for automatic evolution of clusters

Information Sciences: an International Journal
Towards supporting expert evaluation of clustering results using a data mining process model

Information Sciences: an International Journal
An Empirical Study of Categorical Dataset Visualization Using a Simulated Bee Colony Clustering Algorithm

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I
Pairwise-adaptive dissimilarity measure for document clustering

Information Sciences: an International Journal
Automatic threshold estimation for data matching applications

Information Sciences: an International Journal
Minimum spanning tree based split-and-merge: A hierarchical clustering method

Information Sciences: an International Journal
Apply extended self-organizing map to cluster and classify mixed-type data

Neurocomputing
A dissimilarity measure for the k-Modes clustering algorithm

Knowledge-Based Systems
Determining the number of clusters using information entropy for mixed data

Pattern Recognition
Learning data structure from classes: A case study applied to population genetics

Information Sciences: an International Journal
DBCAMM: A novel density based clustering algorithm via using the Mahalanobis metric

Applied Soft Computing
Adjusting the clustering results referencing an external set

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
A framework for strategy formulation based on clustering approach: A case study in a corporate organization

Knowledge-Based Systems
Adapting domain ontology for personalized knowledge search and recommendation

Information and Management

Quantified Score

Hi-index	0.07

Visualization

Abstract

Data clustering is an important data mining technique which partitions data according to some similarity criterion. Abundant algorithms have been proposed for clustering numerical data and some recent research tackles the problem of clustering categorical or mixed data. Unlike the subtraction scheme used for numerical attributes, there is no standard for measuring distance between categorical values. In this article, we propose a distance representation scheme, distance hierarchy, which facilitates expressing the similarity between categorical values and also unifies distance measuring of numerical and categorical values. We then apply the scheme to mixed data clustering, in particular, to integrate with a hierarchical clustering algorithm. Consequently, this integrated approach can uniformly handle numerical data and categorical data, and also enables one to take the similarity between categorical values into consideration. Experimental results show that the proposed approach produces better clustering results than conventional clustering algorithms when categorical attributes are present and their values have different degree of similarity.