On the application of syntactic methodologies in automatic text analysis
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Progress in the application of natural language processing to information retrieval tasks
The Computer Journal - Special issue on information retrieval
A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An investigation of linguistic features and clustering algorithms for topical document clustering
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A Dynamic Probabilistic Model to Visualise Topic Evolution in Text Streams
Journal of Intelligent Information Systems
Topic Identification in Dynamical Text by Complexity Pursuit
Neural Processing Letters
Domain-independent text segmentation using anisotropic diffusion and dynamic programming
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Text classification and named entities for new event detection
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning effective ranking functions for newsgroup search
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Combining Topic Models and Social Networks for Chat Data Mining
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Topic segmentation of message hierarchies for indexing and navigation support
WWW '05 Proceedings of the 14th international conference on World Wide Web
NaLIX: A generic natural language search environment for XML data
ACM Transactions on Database Systems (TODS)
Comments-oriented blog summarization by sentence extraction
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Bounding and comparing methods for correlation clustering beyond ILP
ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Context-based message expansion for disentanglement of interleaved text conversations
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using universal linguistic knowledge to guide grammar induction
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Computational Linguistics
Disentangling chat with local coherence models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning online discussion structures by conditional random fields
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Dynamically Modeling Semantic Dependencies in Web Forum Threads
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Online conversation mining for author characterization and topic identification
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Contextual correlation based thread detection in short text message streams
Journal of Intelligent Information Systems
Multiple narrative disentanglement: unraveling Infinite Jest
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
AttitudeMiner: mining attitude from online discussions
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session
Discovering habits of effective online support group chatrooms
Proceedings of the 17th ACM international conference on Supporting group work
Subgroup detector: a system for detecting subgroups in online discussions
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Hierarchical conversation structure prediction in multi-party chat
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Extracting signed social networks from text
TextGraphs-7 '12 Workshop Proceedings of TextGraphs-7 on Graph-based Methods for Natural Language Processing
An Evolutionary-Based Method for Reconstructing Conversation Threads in Email Corpora
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
A learning approach for email conversation thread reconstruction
Journal of Information Science
Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Hi-index | 0.01 |
Text message stream is a newly emerging type of Web data which is produced in enormous quantities with the popularity of Instant Messaging and Internet Relay Chat. It is beneficial for detecting the threads contained in the text stream for various applications, including information retrieval, expert recognition and even crime prevention. Despite its importance, not much research has been conducted so far on this problem due to the characteristics of the data in which the messages are usually very short and incomplete. In this paper, we present a stringent definition of the thread detection task and our preliminary solution to it. We propose three variations of a single-pass clustering algorithm for exploiting the temporal information in the streams. An algorithm based on linguistic features is also put forward to exploit the discourse structure information. We conducted several experiments to compare our approaches with some existing algorithms on a real dataset. The results show that all three variations of the single-pass algorithm outperform the basic single-pass algorithm. Our proposed algorithm based on linguistic features improves the performance relatively by 69.5% and 9.7% when compared with the basic single-pass algorithm and the best variation algorithm in terms of F1 respectively.