DVD: a model for event diversified versions discovery

  • Authors:
  • Liang Kong;Rui Yan;Yijun He;Yan Zhang;Zhenwei Zhang;Li Fu

  • Affiliations:
  • Department of Machine Intelligence, Peking University and Key Laboratory on Machine Perception, Ministry of Education, Beijing, China;Department of Computer Science, Peking University, Beijing, China;Department of Computer Science, Peking University, Beijing, China;Department of Machine Intelligence, Peking University and Key Laboratory on Machine Perception, Ministry of Education, Beijing, China;Service Software Chongqing Institute of ZTE Corporation, China;Service Software Chongqing Institute of ZTE Corporation, China

  • Venue:
  • APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the development of the techniques of Event Detection and Tracking, it is feasible to gather text information from many sources and structure it into events which are constructed online automatically and updated temporally. There are always diversified versions to describe an event and users usually are eager to know all the versions. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event diversified versions discovery. We introduce a novel and principled model (called DVD) for discovering diversified versions for events. Unlike traditional clustering methods, we apply an iterative algorithm on a bipartite graph integrating co-occurrence and semantics to select the popular words and filter them to reduce the tight correlation between documents in a specific event. Hybrid link structures between words are utilized to find the hierarchical relationships. We employ a web communities discovery algorithm to construct virtual-documents which consist of a bag of words indicating one of the diversified versions. Under Rocchio Classification framework, we can classify the documents to diversified versions. With our novel evaluation method, empirical experiments on two real datasets show that DVD is effective and outperforms various related algorithms, including classic K-means and LDA.