Rough Set-Based Clustering with Refinement Using Shannon's Entropy Theory

Authors:
Chun-Bao Chen;Li-Ya Wang
Affiliations:
Department of Industrial Engineering and Management, Shanghai Jiao Tong University, Shanghai 200030, P.R. China;Department of Industrial Engineering and Management, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
Venue:
Computers & Mathematics with Applications
Year:
2006

Citing 3
Cited 7

Fuzzy sets and fuzzy logic: theory and applications

Fuzzy sets and fuzzy logic: theory and applications
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Applying Knowledge Discovery to Predict Water-Supply Consumption

IEEE Expert: Intelligent Systems and Their Applications

An initialization method for the K-Means algorithm using neighborhood model

Computers & Mathematics with Applications
Multi-attribute Weight Allocation Based on Fuzzy Clustering Analysis and Rough Sets

ISICA '09 Proceedings of the 4th International Symposium on Advances in Computation and Intelligence
Rough sets and near sets in medical imaging: a review

IEEE Transactions on Information Technology in Biomedicine - Special section on body sensor networks
A data labeling method for clustering categorical data

Expert Systems with Applications: An International Journal
A framework for clustering categorical time-evolving data

IEEE Transactions on Fuzzy Systems
A dissimilarity measure for the k-Modes clustering algorithm

Knowledge-Based Systems
CD: a coupled discretization algorithm

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II

Quantified Score

Hi-index	0.09

Visualization

Abstract

Lots of clustering algorithms have been developed, while most of them cannot process objects in hybrid numerical/nominal attribute space or with missing values. In most of them, the number of clusters should be manually determined and the clustering results are sensitive to the input order of the objects to be clustered. These limit applicability of the clustering and reduce the quality of clustering. To solve this problem, an improved clustering algorithm based on rough set (RS) and entropy theory was presented. It aims at avoiding the need to prespecify the number of clusters, and clustering in both numerical and nominal attribute space with the similarity introduced to replace the distance index. At the same time, the RS theory endows the algorithm with the function to deal with vagueness and uncertainty in data analysis. Shannon's entropy was used to refine the clustering results by assigning relative weights to the set of attributes according to the mutual entropy values. A novel measure of clustering quality was also presented to evaluate the clusters. This algorithm was analyzed and applied later to cluster the data set of one industrial product. The experimental results confirm that performances of efficiency and clustering quality of this algorithm are improved.