Thread detection in dynamic text message streams

Authors:
Dou Shen;Qiang Yang;Jian-Tao Sun;Zheng Chen
Affiliations:
Hong Kong University of Science and Technology;Hong Kong University of Science and Technology;Microsoft Research Asia, Beijing, P.R.China;Microsoft Research Asia, Beijing, P.R.China
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 12
Cited 21

On the application of syntactic methodologies in automatic text analysis

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Progress in the application of natural language processing to information retrieval tasks

The Computer Journal - Special issue on information retrieval
A study of retrospective and on-line event detection

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An investigation of linguistic features and clustering algorithms for topical document clustering

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A Dynamic Probabilistic Model to Visualise Topic Evolution in Text Streams

Journal of Intelligent Information Systems
Topic Identification in Dynamical Text by Complexity Pursuit

Neural Processing Letters
Domain-independent text segmentation using anisotropic diffusion and dynamic programming

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Text classification and named entities for new event detection

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning effective ranking functions for newsgroup search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Combining Topic Models and Social Networks for Chat Data Mining

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Topic segmentation of message hierarchies for indexing and navigation support

WWW '05 Proceedings of the 14th international conference on World Wide Web

NaLIX: A generic natural language search environment for XML data

ACM Transactions on Database Systems (TODS)
Comments-oriented blog summarization by sentence extraction

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Bounding and comparing methods for correlation clustering beyond ILP

ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Context-based message expansion for disentanglement of interleaved text conversations

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using universal linguistic knowledge to guide grammar induction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Disentangling chat

Computational Linguistics
Disentangling chat with local coherence models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning online discussion structures by conditional random fields

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Dynamically Modeling Semantic Dependencies in Web Forum Threads

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Online conversation mining for author characterization and topic identification

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Contextual correlation based thread detection in short text message streams

Journal of Intelligent Information Systems
Multiple narrative disentanglement: unraveling Infinite Jest

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
AttitudeMiner: mining attitude from online discussions

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session
Discovering habits of effective online support group chatrooms

Proceedings of the 17th ACM international conference on Supporting group work
Subgroup detector: a system for detecting subgroups in online discussions

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Hierarchical conversation structure prediction in multi-party chat

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Extracting signed social networks from text

TextGraphs-7 '12 Workshop Proceedings of TextGraphs-7 on Graph-based Methods for Natural Language Processing
An Evolutionary-Based Method for Reconstructing Conversation Threads in Email Corpora

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
A learning approach for email conversation thread reconstruction

Journal of Information Science
Mining Top-K Rank Frequent Patterns in Data Streams A Tree Based Approach with Ternary Function and Ternary Feature Vector

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Text message stream is a newly emerging type of Web data which is produced in enormous quantities with the popularity of Instant Messaging and Internet Relay Chat. It is beneficial for detecting the threads contained in the text stream for various applications, including information retrieval, expert recognition and even crime prevention. Despite its importance, not much research has been conducted so far on this problem due to the characteristics of the data in which the messages are usually very short and incomplete. In this paper, we present a stringent definition of the thread detection task and our preliminary solution to it. We propose three variations of a single-pass clustering algorithm for exploiting the temporal information in the streams. An algorithm based on linguistic features is also put forward to exploit the discourse structure information. We conducted several experiments to compare our approaches with some existing algorithms on a real dataset. The results show that all three variations of the single-pass algorithm outperform the basic single-pass algorithm. Our proposed algorithm based on linguistic features improves the performance relatively by 69.5% and 9.7% when compared with the basic single-pass algorithm and the best variation algorithm in terms of F1 respectively.