A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
A vector space model for automatic indexing
Communications of the ACM
Information Retrieval
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Discourse segmentation by human and automated means
Computational Linguistics
Discourse segmentation of multi-party conversation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Thread detection in dynamic text message streams
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An orthonormal basis for topic segmentation in tutorial dialogue
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Why we twitter: understanding microblogging usage and communities
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Instant message clustering based on extended vector space model
ISICA'07 Proceedings of the 2nd international conference on Advances in computation and intelligence
Aircraft interior failure pattern recognition utilizing text mining and neural networks
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Short text message streams are produced by Instant Messaging and Short Message Service which are wildly used nowadays. Each stream contains more than one thread usually. Detecting threads in the streams is helpful to various applications, such as business intelligence, investigation of crime and public opinion analysis. Existing works which are mainly based on text similarity encounter many challenges including the sparse eigenvector and anomaly of short text message. This paper introduces a novel concept of contextual correlation instead of the traditional text similarity into single-pass clustering algorithm to cover the challenges of thread detection. We firstly analyze the contextually correlative nature of conversations in short text message streams, and then propose an unsupervised method to compute the correlative degree. As a reference, a single-pass algorithm employing the contextual correlation is developed to detect threads in massive short text stream. Experiments on large real-life online chat logs show that our approach improves the performance by 11% when compared with the best similarity-based algorithm in terms of F1 measure.