CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the ninth international conference on Information and knowledge management
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Structuring Domain-Specific Text Archives by Deriving a Probabilistic XML DTD
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Introduction to topic detection and tracking
Topic detection and tracking
DEMON: Mining and Monitoring Evolving Data
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Topic dynamics: an alternative model of bursts in streams of topics
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering emerging topics in unlabelled text collections
ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Comparing clustering algorithms and their influence on the evolution of labeled clusters
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
As document collections accummulate over time, some of the discussion subjects in them become outfashioned, while new ones emerge. In this paper, we address the challenge of finding such emerging and persistent "themes", i.e. subjects that live long enough to be incorporated into a taxonomy or ontology describing the document collection. Our method is based on similarity-based clustering and cluster label construction and focusses on the identification of cluster labels that "survive" changes in the constitution of the underlying population of documents, including changes in the feature space of dominant words. We conducted a set of promising experiments on the identification of themes that manifested themselves in the ACM library within the last decade.