Semi-supervised hierarchical co-clustering

  • Authors:
  • Feifei Huang;Yan Yang;Tao Li;Jinyuan Zhang;Tonny Rutayisire;Amjad Mahmood

  • Affiliations:
  • School of Information Science and Technology, Provincial Key Lab of Cloud Computing and Intelligent Technology, Southwest Jiaotong University, Chengdu, P.R. China;School of Information Science and Technology, Provincial Key Lab of Cloud Computing and Intelligent Technology, Southwest Jiaotong University, Chengdu, P.R. China;School of Computer Science, Florida International University, Miami, FL;School of Information Science and Technology, Provincial Key Lab of Cloud Computing and Intelligent Technology, Southwest Jiaotong University, Chengdu, P.R. China;School of Information Science and Technology, Provincial Key Lab of Cloud Computing and Intelligent Technology, Southwest Jiaotong University, Chengdu, P.R. China;School of Information Science and Technology, Provincial Key Lab of Cloud Computing and Intelligent Technology, Southwest Jiaotong University, Chengdu, P.R. China

  • Venue:
  • RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hierarchical co-clustering aims at generating dendrograms for the rows and columns of the input data matrix. The limitation of using simple hierarchical co-clustering for document clustering is that it has a lot of feature terms and documents, and it also ignores the semantic relations between feature terms. In this paper a semi-supervised clustering algorithm is proposed for hierarchical co-clustering. In the first step feature terms are clustered using a little supervised information. In the second step, the feature terms are merged as new feature attributes. And in the last step, the documents and merged feature terms are clustered using hierarchical co-clustering algorithm. Semantic information is used to measure the similarity during the hierarchical co-clustering process. Experimental results show that the proposed algorithm is effective and efficient.