A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional

Authors:
Sotirios P. Chatzis
Affiliations:
Department of Electrical and Electronic Engineering, Imperial College London, Exhibition Road, South Kensington Campus SW7 2BT, UK
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 15
Cited 3

Algorithms for clustering data

Algorithms for clustering data
Unsupervised Optimal Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
The formation and use of abstract concepts in design

Concept formation knowledge and experience in unsupervised learning
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing

Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Unsupervised Learning with Mixed Numeric and Nominal Data

IEEE Transactions on Knowledge and Data Engineering
Mixture model clustering for mixed data with missing information

Computational Statistics & Data Analysis
Feature Weighting in k-Means Clustering

Machine Learning
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Robust fuzzy clustering using mixtures of Student's-t distributions

Pattern Recognition Letters
A genetic fuzzy k-Modes algorithm for clustering categorical data

Expert Systems with Applications: An International Journal
Factor analysis latent subspace modeling and robust fuzzy clustering using t-distributions

IEEE Transactions on Fuzzy Systems
A Clustering Performance Measure Based on Fuzzy Set Decomposition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Regularized Linear Fuzzy Clustering and Probabilistic PCA Mixture Models

IEEE Transactions on Fuzzy Systems

A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Knowledge-Based Systems
A data mining approach to knowledge discovery from multidimensional cube structures

Knowledge-Based Systems
CRUDAW: a novel fuzzy technique for clustering records following user defined attribute weights

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	12.05

Visualization

Abstract

Gath-Geva (GG) algorithm is one of the most popular methodologies for fuzzy c-means (FCM)-type clustering of data comprising numeric attributes; it is based on the assumption of data deriving from clusters of Gaussian form, a much more flexible construction compared to the spherical clusters assumption of the original FCM. In this paper, we introduce an extension of the GG algorithm to allow for the effective handling of data with mixed numeric and categorical attributes. Traditionally, fuzzy clustering of such data is conducted by means of the fuzzy k-prototypes algorithm, which merely consists in the execution of the original FCM algorithm using a different dissimilarity functional, suitable for attributes with mixed numeric and categorical attributes. On the contrary, in this work we provide a novel FCM-type algorithm employing a fully probabilistic dissimilarity functional for handling data with mixed-type attributes. Our approach utilizes a fuzzy objective function regularized by Kullback-Leibler (KL) divergence information, and is formulated on the basis of a set of probabilistic assumptions regarding the form of the derived clusters. We evaluate the efficacy of the proposed approach using benchmark data, and we compare it with competing fuzzy and non-fuzzy clustering algorithms.