Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: concepts and techniques
Data mining: concepts and techniques
Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
Modern Information Retrieval
A matrix density based algorithm to hierarchically co-cluster documents and words
WWW '03 Proceedings of the 12th international conference on World Wide Web
Data Mining: Concepts and Algorithms From Multimedia to Bioinformatics
Data Mining: Concepts and Algorithms From Multimedia to Bioinformatics
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A generalized maximum entropy approach to bregman co-clustering and matrix approximation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Possibilistic fuzzy co-clustering of large document collections
Pattern Recognition
Hi-index | 0.00 |
Fuzzy co-clustering is a method that performs simultaneous fuzzy clustering of objects and features. In this paper, we introduce a new fuzzy co-clustering algorithm for high-dimensional datasets called Cosine-Distance-based & Dual-partitioning Fuzzy Co-clustering (CODIALING FCC). Unlike many existing fuzzy co-clustering algorithms, CODIALING FCC is a dual-partitioning algorithm. It clusters the features in the same manner as it clusters the objects, that is, by partitioning them according to their natural groupings. It is also a cosine-distance-based algorithm because it utilizes the cosine distance to capture the belongingness of objects and features in the co-clusters. Our main purpose of introducing this new algorithm is to improve the performance of some prominent existing fuzzy co-clustering algorithms in dealing with datasets with high overlaps. In our opinion, this is very crucial since most real-world datasets involve significant amount of overlaps in their inherent clustering structures. We discuss how this improvement can be made through the dual-partitioning formulation adopted. Experimental results on a toy problem and five large benchmark document datasets demonstrate the effectiveness of CODIALING FCC in handling overlaps better.