Extracting significant time varying features from text
Proceedings of the eighth international conference on Information and knowledge management
Automatic generation of overview timelines
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to topic detection and tracking
Topic detection and tracking
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Text classification and named entities for new event detection
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Using topic salience and connotational drifts to detect candidates to semantic change
IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
Hi-index | 0.00 |
We present an approach for automatic detection of topic change. Our approach is based on the analysis of statistical features of topics in time-sliced corpora and their dynamics over time. Processing large amounts of time-annotated news text, we identify new facets regarding a stream of topics consisting of latest news of public interest. Adaptable as an addition to the well known task of topic detection and tracking we aim to boil down a daily news stream to its novelty. For that we examine the contextual shift of the concepts over time slices. To quantify the amount of change, we adopt the volatility measure from econometrics and propose a new algorithm for frequency-independent detection of topic drift and change of meaning. The proposed measure does not rely on plain word frequency but the mixture of the co-occurrences of words. So, the analysis is highly independent of the absolute word frequencies and works over the whole frequency spectrum, especially also well for low-frequent words. Aggregating the computed time-related data of the terms allows to build overview illustrations of the most evolving terms for a whole time span.