Effective retrieval of structured documents
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Topic segmentation with an aspect hidden Markov model
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-based document segmentation with probabilistic latent semantic analysis
Proceedings of the eleventh international conference on Information and knowledge management
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Topic segmentation: algorithms and applications
Topic segmentation: algorithms and applications
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
SeLeCT: a lexical cohesion based news story segmentation system
AI Communications - STAIRS 2002
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
A Dynamic Programming Algorithm for Linear Text Segmentation
Journal of Intelligent Information Systems
Story boundary detection in large broadcast news video archives: techniques, experience and trends
Proceedings of the 12th annual ACM international conference on Multimedia
Linear text segmentation using a dynamic programming algorithm
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Minimum cut model for spoken lecture segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Inference and evaluation of the multinomial mixture model for text clustering
Information Processing and Management: an International Journal
Employing Latent Dirichlet Allocation for fraud detection in telecommunications
Pattern Recognition Letters
Text segmentation with LDA-based Fisher kernel
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Using LDA to detect semantically incoherent documents
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Expectation-propagation for the generative aspect model
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
It is the time for portuguese texts!
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Topical segmentation: a study of human performance and a new measure of quality
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Modelling sequential text with an adaptive topic model
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Position-Aligned translation model for citation recommendation
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Unsupervised text segmentation using LDA and MCMC
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of two unsupervised topic models, latent Dirichlet allocation (LDA) and multinomial mixture (MM), to segment a text into semantically coherent parts. The proposed topic model based approaches consistently outperform a standard baseline method on several datasets. A major benefit of the proposed LDA based approach is that along with the segment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications such as segment retrieval and discourse analysis. However, the proposed approaches, especially the LDA based method, have high computational requirements. Based on an analysis of the dynamic programming (DP) algorithm typically used for segmentation, we suggest a modification to DP that dramatically speeds up the process with no loss in performance. The proposed modification to the DP algorithm is not specific to the topic models only; it is applicable to all the algorithms that use DP for the task of text segmentation.