Attention, intentions, and the structure of discourse
Computational Linguistics
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
The Journal of Machine Learning Research
Empirical studies on the disambiguation of cue phrases
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Intention-based segmentation: human reliability and correlation with linguistic cues
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Combining multiple knowledge sources for discourse segmentation
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Entropy rate constancy in text
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discourse segmentation of multi-party conversation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Broad coverage paragraph segmentation across languages and domains
ACM Transactions on Speech and Language Processing (TSLP)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Unsupervised topic modelling for multi-party spoken discourse
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Minimum cut model for spoken lecture segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Context-based message expansion for disentanglement of interleaved text conversations
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hierarchical text segmentation from multi-scale lexical cohesion
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Exploring content models for multi-document summarization
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Global models of document structure using latent permutations
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning semantic correspondences with less supervision
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Participant subjectivity and involvement as a basis for discourse segmentation
SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Content modeling using latent permutations
Journal of Artificial Intelligence Research
Evaluating hierarchical discourse segmentation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised discourse segmentation of documents with inherently parallel structure
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Multi-document topic segmentation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A rhetorical syntax-driven model for speech summarization
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improving text segmentation with non-systematic semantic relation
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
An iterative approach to text segmentation
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
SciSumm: a multi-document summarization system for scientific articles
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Discovery of topically coherent sentences for extractive summarization
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Disentangling chat with local coherence models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised segmentation of bibliographic elements with latent permutations
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Linear text segmentation using affinity propagation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Topical segmentation: a study of human performance and a new measure of quality
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Discourse structure and computation: past, present and future
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Modelling sequential text with an adaptive topic model
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Two-part segmentation of text documents
Proceedings of the 21st ACM international conference on Information and knowledge management
Discourse structure and language technology
Natural Language Engineering
Unsupervised Segmentation of Bibliographic Elements with Latent Permutations
International Journal of Organizational and Collective Intelligence
An unsupervised topic segmentation model incorporating word order
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Triggering effective social support for online groups
ACM Transactions on Interactive Intelligent Systems (TiiS)
On handling textual errors in latent document modeling
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Topic segmentation and labeling in asynchronous conversations
Journal of Artificial Intelligence Research
Integrated Computer-Aided Engineering
Hi-index | 0.00 |
This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of well-formed segments to induce a compact and consistent lexical distribution. We show that lexical cohesion can be placed in a Bayesian context by modeling the words in each topic segment as draws from a multinomial language model associated with the segment; maximizing the observation likelihood in such a model yields a lexically-cohesive segmentation. This contrasts with previous approaches, which relied on hand-crafted cohesion metrics. The Bayesian framework provides a principled way to incorporate additional features such as cue phrases, a powerful indicator of discourse structure that has not been previously used in unsupervised segmentation systems. Our model yields consistent improvements over an array of state-of-the-art systems on both text and speech datasets. We also show that both an entropy-based analysis and a well-known previous technique can be derived as special cases of the Bayesian framework.