Towards automatic detection and tracking of topic change

Authors:
Florian Holz;Sven Teresniak
Affiliations:
NLP Group, Department of Computer Science, University of Leipzig;NLP Group, Department of Computer Science, University of Leipzig
Venue:
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2010

Citing 7
Cited 1

Extracting significant time varying features from text

Proceedings of the eighth international conference on Information and knowledge management
Automatic generation of overview timelines

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to topic detection and tracking

Topic detection and tracking
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Text classification and named entities for new event detection

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Using topic salience and connotational drifts to detect candidates to semantic change

IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an approach for automatic detection of topic change. Our approach is based on the analysis of statistical features of topics in time-sliced corpora and their dynamics over time. Processing large amounts of time-annotated news text, we identify new facets regarding a stream of topics consisting of latest news of public interest. Adaptable as an addition to the well known task of topic detection and tracking we aim to boil down a daily news stream to its novelty. For that we examine the contextual shift of the concepts over time slices. To quantify the amount of change, we adopt the volatility measure from econometrics and propose a new algorithm for frequency-independent detection of topic drift and change of meaning. The proposed measure does not rely on plain word frequency but the mixture of the co-occurrences of words. So, the analysis is highly independent of the absolute word frequencies and works over the whole frequency spectrum, especially also well for low-frequent words. Aggregating the computed time-related data of the terms allows to build overview illustrations of the most evolving terms for a whole time span.