A on-line news documents clustering method

  • Authors:
  • Hui Zhang;Guo-hui Li;Xin-wen Xu

  • Affiliations:
  • Department of Engineering, School of Information System and Management, National University of Defense Technology, Changsha, China;Department of Engineering, School of Information System and Management, National University of Defense Technology, Changsha, China;College of Basic Education for officers, National University of Defense Technology, Changsha, China

  • Venue:
  • AMT'12 Proceedings of the 8th international conference on Active Media Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

To improve the efficiency and accuracy of on-line news event detection (ONED) method, we select the words that their term frequency (TF) is greater than a threshold to create the vector space model of the news document, and propose a two-stage clustering method for ONED. This method divides the detection process into two stages. In the first stage, the similar documents collected in a certain period of time are clustered into micro-clusters. In the second stage, the micro-clusters are compared with previous event clusters. The experimental results show that the proposed method has fewer computation load, higher computing rate, and less loss of accuracy.