A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Authors:
Jinchao Ji;Wei Pang;Chunguang Zhou;Xiao Han;Zhe Wang
Affiliations:
College of Computer Science and Technology, Jilin University, Changchun 130012, China;College of Computer Science and Technology, Jilin University, Changchun 130012, China and School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, AB24 3UE, UK;College of Computer Science and Technology, Jilin University, Changchun 130012, China;College of Mathematics, Jilin University, Changchun 130012, China;College of Computer Science and Technology, Jilin University, Changchun 130012, China
Venue:
Knowledge-Based Systems
Year:
2012

Citing 24
Cited 6

Algorithms for clustering data

Algorithms for clustering data
Unsupervised Optimal Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
The formation and use of abstract concepts in design

Concept formation knowledge and experience in unsupervised learning
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Unsupervised feature selection using a neuro-fuzzy approach

Pattern Recognition Letters
Data clustering: a review

ACM Computing Surveys (CSUR)
Data mining: concepts and techniques

Data mining: concepts and techniques
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing

Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Unsupervised Learning with Mixed Numeric and Nominal Data

IEEE Transactions on Knowledge and Data Engineering
Improving Performance of Similarity-Based Clustering by Feature Weight Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fuzzy clustering of categorical data using fuzzy centroids

Pattern Recognition Letters
A k-mean clustering algorithm for mixed numeric and categorical data

Data & Knowledge Engineering
Incremental clustering of mixed data based on distance hierarchy

Expert Systems with Applications: An International Journal
G-ANMI: A mutual information based genetic clustering algorithm for categorical data

Knowledge-Based Systems
Text clustering using frequent itemsets

Knowledge-Based Systems
A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional

Expert Systems with Applications: An International Journal
An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data

Knowledge-Based Systems
A quality driven Hierarchical Data Divisive Soft Clustering for information retrieval

Knowledge-Based Systems
A dissimilarity measure for the k-Modes clustering algorithm

Knowledge-Based Systems
Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes

ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
A fuzzy k-modes algorithm for clustering categorical data

IEEE Transactions on Fuzzy Systems

Clustering-oriented privacy-preserving data publishing

Knowledge-Based Systems
Fuzzy expert system approach for coronary artery disease screening using clinical parameters

Knowledge-Based Systems
A data mining approach to knowledge discovery from multidimensional cube structures

Knowledge-Based Systems
A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval

Knowledge-Based Systems
Spatial interaction - modification model and applications to geo-demographic analysis

Knowledge-Based Systems
CRUDAW: a novel fuzzy technique for clustering records following user defined attribute weights

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many applications, data objects are described by both numeric and categorical features. The k-prototype algorithm is one of the most important algorithms for clustering this type of data. However, this method performs hard partition, which may lead to misclassification for the data objects in the boundaries of regions, and the dissimilarity measure only uses the user-given parameter for adjusting the significance of attribute. In this paper, first, we combine mean and fuzzy centroid to represent the prototype of a cluster, and employ a new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters. This measure also takes into account the significance of different attributes towards the clustering process. Then we present our algorithm for clustering mixed data. Finally, the performance of the proposed method is demonstrated by a series of experiments on four real world datasets in comparison with that of traditional clustering algorithms.