Towards effective document clustering: A constrained K-means based approach

  • Authors:
  • Guobiao Hu;Shuigeng Zhou;Jihong Guan;Xiaohua Hu

  • Affiliations:
  • Department of Computer Science and Engineering, Fudan University, Shanghai 200433, China and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China;Department of Computer Science and Engineering, Fudan University, Shanghai 200433, China and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China;Department of Computer Science and Technology, Tongji University, Shanghai 200092, China;College of Information Science and Technology, Drexel University, Philadelphia, PA 19104, USA

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then,the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods.