A new fuzzy co-clustering algorithm for categorization of datasets with overlapping clusters

Authors:
William-Chandra Tjhi;Lihui Chen
Affiliations:
Nanyang Technological University, Republic of Singapore;Nanyang Technological University, Republic of Singapore
Venue:
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Year:
2006

Citing 9
Cited 1

Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: concepts and techniques

Data mining: concepts and techniques
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Modern Information Retrieval

Modern Information Retrieval
A matrix density based algorithm to hierarchically co-cluster documents and words

WWW '03 Proceedings of the 12th international conference on World Wide Web
Data Mining: Concepts and Algorithms From Multimedia to Bioinformatics

Data Mining: Concepts and Algorithms From Multimedia to Bioinformatics
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A generalized maximum entropy approach to bregman co-clustering and matrix approximation

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Possibilistic fuzzy co-clustering of large document collections

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fuzzy co-clustering is a method that performs simultaneous fuzzy clustering of objects and features. In this paper, we introduce a new fuzzy co-clustering algorithm for high-dimensional datasets called Cosine-Distance-based & Dual-partitioning Fuzzy Co-clustering (CODIALING FCC). Unlike many existing fuzzy co-clustering algorithms, CODIALING FCC is a dual-partitioning algorithm. It clusters the features in the same manner as it clusters the objects, that is, by partitioning them according to their natural groupings. It is also a cosine-distance-based algorithm because it utilizes the cosine distance to capture the belongingness of objects and features in the co-clusters. Our main purpose of introducing this new algorithm is to improve the performance of some prominent existing fuzzy co-clustering algorithms in dealing with datasets with high overlaps. In our opinion, this is very crucial since most real-world datasets involve significant amount of overlaps in their inherent clustering structures. We discuss how this improvement can be made through the dual-partitioning formulation adopted. Experimental results on a toy problem and five large benchmark document datasets demonstrate the effectiveness of CODIALING FCC in handling overlaps better.