Temporal summaries of new topics
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Query based event extraction along a timeline
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
ICML '06 Proceedings of the 23rd international conference on Machine learning
Near Optimal Dimensionality Reductions That Preserve Volumes
APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques
Meme-tracking and the dynamics of the news cycle
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
LexRank: graph-based lexical centrality as salience in text summarization
Journal of Artificial Intelligence Research
Connecting the dots between news articles
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Evolutionary timeline summarization: a balanced optimization framework via iterative substitution
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Trains of thought: generating information maps
Proceedings of the 21st international conference on World Wide Web
Text-based measures of document diversity
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
We propose a novel probabilistic technique for modeling and extracting salient structure from large document collections. As in clustering and topic modeling, our goal is to provide an organizing perspective into otherwise overwhelming amounts of information. We are particularly interested in revealing and exploiting relationships between documents. To this end, we focus on extracting diverse sets of threads---singly-linked, coherent chains of important documents. To illustrate, we extract research threads from citation graphs and construct timelines from news articles. Our method is highly scalable, running on a corpus of over 30 million words in about four minutes, more than 75 times faster than a dynamic topic model. Finally, the results from our model more closely resemble human news summaries according to several metrics and are also preferred by human judges.