Proceedings of the ninth international conference on Information and knowledge management
Tracking dynamics of topic trends using a finite mixture model
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
2005 Special Issue: Efficient streaming text clustering
Neural Networks - 2005 Special issue: IJCNN 2005
Expanding the taxonomies of bibliographic archives with persistent long-term themes
Proceedings of the 2006 ACM symposium on Applied computing
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering emerging topics in unlabelled text collections
ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Hi-index | 0.00 |
We study the influence of different clustering algorithms on cluster evolution monitoring in data streams. The capturing and interpretation of cluster change delivers indicators on the evolution of the underlying population. For text stream monitoring, the clusters can be summarized into topics, so that cluster monitoring provides insights on the data and decline of thematic subjects over time. However, such insights should always be taken with a grain of salt: The quality of the clusters has a decisive impact on the observed changes. In the simplest case, cluster change across the stream may be due to the low quality of the original cluster than to a drift in the population belonging to this cluster.We show our framework ThemeFinder for topic evolution monitoring in streams and compare the influence to the quality of two very different cluster algorithms. After an evaluation of different cluster algorithms with external and internal quality measures, we use the center based bisecting k-means algorithm and the density-based DBScan algorithm. Our results show that the influence is relatively high and show that different clustering algorithms results allow to draw conclusion to the evaluation of the other cluster algorithm. Our experiments were done on a subarchive of the ACM library.