Topic segmentation and labeling in asynchronous conversations

Authors:
Shafiq Joty;Giuseppe Carenini;Raymond T. Ng
Affiliations:
Qatar Computing Research Institute, Qatar Foundation, Doha, Qatar;University of British Columbia, Vancouver, BC, Canada;University of British Columbia, Vancouver, BC, Canada
Venue:
Journal of Artificial Intelligence Research
Year:
2013

Citing 53
Cited 0

Support-Vector Networks

Machine Learning
Bagging predictors

Machine Learning
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Topic segmentation with an aspect hidden Markov model

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Language and the Internet

Language and the Internet
Topic Detection and Tracking: Event-Based Information Organization

Topic Detection and Tracking: Event-Based Information Organization
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Learning Algorithms for Keyphrase Extraction

Information Retrieval
A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
Correlation Clustering

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Retrieval and novelty detection at the sentence level

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
A machine learning approach to coreference resolution of noun phrases

Computational Linguistics - Special issue on computational anaphora resolution
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Discourse segmentation by human and automated means

Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discourse segmentation of multi-party conversation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Topic themes for multi-document summarization

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Topic modeling: beyond bag-of-words

ICML '06 Proceedings of the 23rd international conference on Machine learning
Unsupervised topic modelling for multi-party spoken discourse

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Minimum cut model for spoken lecture segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning the structure of task-driven human-human dialogs

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Summarizing email conversations with clue words

Proceedings of the 16th international conference on World Wide Web
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Co-ranking Authors and Documents in a Heterogeneous Network

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Combining multiple information layers for the automatic generation of indicative meeting abstracts

ENLG '07 Proceedings of the Eleventh European Workshop on Natural Language Generation
Bayesian unsupervised topic segmentation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Topic segmentation algorithms for text summarization and passage retrieval: an exhaustive evaluation

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Context-based message expansion for disentanglement of interleaved text conversations

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hierarchical text segmentation from multi-scale lexical cohesion

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Human-competitive tagging using automatic keyphrase extraction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Exploiting conversation structure in unsupervised topic segmentation for emails

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Evaluating N-gram based evaluation metrics for automatic keyphrase extraction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Disentangling chat

Computational Linguistics
Graph-based Natural Language Processing and Information Retrieval

Graph-based Natural Language Processing and Information Retrieval
Comparing twitter and traditional media using topic models

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Topical keyphrase extraction from Twitter

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Disentangling chat with local coherence models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic labelling of topic models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning online discussion structures by conditional random fields

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis

ACM Transactions on Intelligent Systems and Technology (TIST)
Unsupervised modeling of dialog acts in asynchronous conversations

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Methods for Mining and Summarizing Text Conversations

Methods for Mining and Summarizing Text Conversations
Machine Learning: A Probabilistic Perspective

Machine Learning: A Probabilistic Perspective
SITS: a hierarchical nonparametric model using speaker identity for topic segmentation in multiparty conversations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hierarchical conversation structure prediction in multi-party chat

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic segmentation and labeling is often considered a prerequisite for higher-level conversation analysis and has been shown to be useful in many Natural Language Processing (NLP) applications. We present two new corpora of email and blog conversations annotated with topics, and evaluate annotator reliability for the segmentation and labeling tasks in these asynchronous conversations. We propose a complete computational framework for topic segmentation and labeling in asynchronous conversations. Our approach extends state-of-the-art methods by considering a fine-grained structure of an asynchronous conversation, along with other conversational features by applying recent graph-based methods for NLP. For topic segmentation, we propose two novel unsupervised models that exploit the fine-grained conversational structure, and a novel graph-theoretic supervised model that combines lexical, conversational and topic features. For topic labeling, we propose two novel (unsupervised) random walk models that respectively capture conversation specific clues from two different sources: the leading sentences and the fine-grained conversational structure. Empirical evaluation shows that the segmentation and the labeling performed by our best models beat the state-of-the-art, and are highly correlated with human annotations.