A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Concept decompositions for large sparse text data using clustering
Machine Learning
Information Retrieval
Robustness beyond shallowness: incremental deep parsing
Natural Language Engineering
Taking Topic Detection From Evaluation to Practice
HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Automatic single-document key fact extraction from newswire articles
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Scalable clustering of news search results
Proceedings of the fourth ACM international conference on Web search and data mining
The bag-of-repeats representation of documents
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Who broke the news?: an analysis on first reports of news events
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
We present a new threshold-based clustering algorithm for news articles. The algorithm consists of two phases: in the first, a local optimum of a score function that captures the quality of a clustering is found with an Expectation-Maximization approach. In the second phase, the algorithm reduces the number of clusters and, in particular, is able to build non-spherical---shaped clusters. We also give a mini-batch version which allows an efficient dynamic processing of data points as they arrive in groups. Our experiments on the TDT5 benchmark collection show the superiority of both versions of this algorithm compared to other state-of-the-art alternatives.