Fuzzy Co-clustering of Web Documents

  • Authors:
  • William-Chandra Tjhi;Lihui Chen

  • Affiliations:
  • Nanyang Technological University, Singapore;Nanyang Technological University, Singapore

  • Venue:
  • CW '05 Proceedings of the 2005 International Conference on Cyberworlds
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web is the largest information repository in the history of mankind. Due to its huge size however, finding relevant information without any appropriate tool can be virtually impossible. Web document clustering is one possible technique to improve the efficiency in information finding process. In this paper, we are looking into fuzzy co-clustering, which is known to be robust for clustering standard text documents. In our opinion, its robustness can also be extended to web documents because it can generate descriptive clusters in high dimension and it is able to discover data clusters with overlaps. We consider two existing fuzzy co-clustering algorithms, FCCM and Fuzzy Codok. In addition, we propose a new algorithm, FCC-STF, as an alternative to the existing ones. Empirical study of these algorithms on benchmark datasets is presented, together with the performance comparison with a standard fuzzy clustering algorithm HFCM. The results show that fuzzy co-clustering is generally superior to standard fuzzy clustering in the Web environment, making it a technique with great potential to assist internet user in discovering relevant information effectively.