Possibilistic fuzzy co-clustering of large document collections

  • Authors:
  • William-Chandra Tjhi;Lihui Chen

  • Affiliations:
  • Nanyang Technological University, Republic of Singapore;Nanyang Technological University, Republic of Singapore

  • Venue:
  • Pattern Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper we propose a new co-clustering algorithm called possibilistic fuzzy co-clustering (PFCC) for automatic categorization of large document collections. PFCC integrates a possibilistic document clustering technique and a combined formulation of fuzzy word ranking and partitioning into a fast iterative co-clustering procedure. This novel framework brings about simultaneously some benefits including robustness in the presence of document and word outliers, rich representations of co-clusters, highly descriptive document clusters, a good performance in a high-dimensional space, and a reduced sensitivity to the initialization in the possibilistic clustering. We present the detailed formulation of PFCC together with the explanations of the motivations behind. The advantages over other existing works and the algorithm's proof of convergence are provided. Experiments on several large document data sets demonstrate the effectiveness of PFCC.