DVD: a model for event diversified versions discovery

Authors:
Liang Kong;Rui Yan;Yijun He;Yan Zhang;Zhenwei Zhang;Li Fu
Affiliations:
Department of Machine Intelligence, Peking University and Key Laboratory on Machine Perception, Ministry of Education, Beijing, China;Department of Computer Science, Peking University, Beijing, China;Department of Computer Science, Peking University, Beijing, China;Department of Machine Intelligence, Peking University and Key Laboratory on Machine Perception, Ministry of Education, Beijing, China;Service Software Chongqing Institute of ZTE Corporation, China;Service Software Chongqing Institute of ZTE Corporation, China
Venue:
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Year:
2011

Citing 14
Cited 0

A study of retrospective and on-line event detection

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
On-line new event detection and tracking

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised and supervised clustering for topic tracking

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-conditioned novelty detection

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A System for new event detection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Event threading within news topics

Proceedings of the thirteenth ACM international conference on Information and knowledge management
CollabSum: exploiting multiple document clustering for collaborative single document summarizations

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting research topics via the correlation between graphs and texts

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-document summarization using cluster-based link analysis

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A sentence level probabilistic model for evolutionary theme pattern mining from news corpora

Proceedings of the 2009 ACM symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the development of the techniques of Event Detection and Tracking, it is feasible to gather text information from many sources and structure it into events which are constructed online automatically and updated temporally. There are always diversified versions to describe an event and users usually are eager to know all the versions. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event diversified versions discovery. We introduce a novel and principled model (called DVD) for discovering diversified versions for events. Unlike traditional clustering methods, we apply an iterative algorithm on a bipartite graph integrating co-occurrence and semantics to select the popular words and filter them to reduce the tight correlation between documents in a specific event. Hybrid link structures between words are utilized to find the hierarchical relationships. We employ a web communities discovery algorithm to construct virtual-documents which consist of a bag of words indicating one of the diversified versions. Under Rocchio Classification framework, we can classify the documents to diversified versions. With our novel evaluation method, empirical experiments on two real datasets show that DVD is effective and outperforms various related algorithms, including classic K-means and LDA.