Multi-document topic segmentation

Authors:
Minwoo Jeong;Ivan Titov
Affiliations:
Saarland University, Saarbrücken, Germany;Saarland University, Saarbrücken, Germany
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 19
Cited 2

Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Topic segmentation with an aspect hidden Markov model

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Topic Detection and Tracking: Event-Based Information Organization

Topic Detection and Tracking: Event-Based Information Organization
Passage retrieval based on language models

Proceedings of the eleventh international conference on Information and knowledge management
A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
Domain-independent text segmentation using anisotropic diffusion and dynamic programming

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
Discourse segmentation of multi-party conversation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A mixture model for contextual text mining

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised topic modelling for multi-party spoken discourse

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Bayesian query-focused summarization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Topic segmentation with shared topic detection and alignment of multiple documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Bayesian unsupervised topic segmentation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Global models of document structure using latent permutations

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised discourse segmentation of documents with inherently parallel structure

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers

Modeling topic hierarchies with the recursive chinese restaurant process

Proceedings of the 21st ACM international conference on Information and knowledge management
Optimizing temporal topic segmentation for intelligent text visualization

Proceedings of the 2013 international conference on Intelligent user interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multiple documents describing the same or closely related sets of events are common and often easy to obtain: for example, consider document clusters on a news aggregator site or multiple reviews of the same product or service. Even though each such document discusses a similar set of topics, they provide alternative views or complimentary information on each of these topics. We argue that revealing hidden relations by jointly segmenting the documents, or, equivalently, predicting links between topically related segments in different documents would help to visualize documents of interest and construct friendlier user interfaces. In this paper, we refer to this problem as multi-document topic segmentation. We propose an unsupervised Bayesian model for the considered problem that models both shared and document-specific topics, and utilizes Dirichlet process priors to determine the effective number of topics. We show that topic segmentation can be inferred efficiently using a simple split-merge sampling algorithm. The resulting method outperforms baseline models on four datasets for multi-document topic segmentation.