Topic-constrained hierarchical clustering for document datasets

  • Authors:
  • Ying Zhao

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua University, Beijing, China

  • Venue:
  • ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose the topic-constrained hierarchical clustering, which organizes document datasets into hierarchical trees consistant with a given set of topics. The proposed algorithm is based on a constrained agglomerative clustering framework and a semi-supervised criterion function that emphasizes the relationship between documents and topics and the relationship among documents themselves simultaneously. The experimental evaluation show that our algorithm outperformed the traditional agglomerative algorithm by 7.8% to 11.4%.