Fuzzy semi-supervised co-clustering for text documents

  • Authors:
  • Yang Yan;Lihui Chen;William-Chandra Tjhi

  • Affiliations:
  • Nanyang Technological University, School of Electric and Electronic Engineering, Republic of Singapore;Nanyang Technological University, School of Electric and Electronic Engineering, Republic of Singapore;Nanyang Technological University, School of Electric and Electronic Engineering, Republic of Singapore

  • Venue:
  • Fuzzy Sets and Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.20

Visualization

Abstract

In this paper we propose a new heuristic semi-supervised fuzzy co-clustering algorithm (SS-HFCR) for categorization of large web documents. In this approach, the clustering process is carried out by incorporating some prior knowledge in the form of pair-wise constraints provided by users into the fuzzy co-clustering framework. Each constraint specifies whether a pair of documents ''must'' or ''cannot'' be clustered together. Moreover, we formulate the competitive agglomeration cost function which is also able to make use of prior knowledge in the clustering process. The experimental studies on a number of large benchmark datasets demonstrate the strength and potentials of SS-HFCR in terms of accuracy, stability and efficiency, compared with some of the recent popular semi-supervised clustering approaches.