Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Topic segmentation with an aspect hidden Markov model
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
The Journal of Machine Learning Research
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
Topic modeling: beyond bag-of-words
ICML '06 Proceedings of the 23rd international conference on Machine learning
Latent Dirichlet Co-Clustering
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Subsequence-Based Text Segmentation and Labeling
ETCS '09 Proceedings of the 2009 First International Workshop on Education Technology and Computer Science - Volume 01
Bayesian unsupervised topic segmentation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Text segmentation via topic modeling: an analytical study
Proceedings of the 18th ACM conference on Information and knowledge management
A statistical model for topic segmentation and clustering
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Discovery of latent subcommunities in a blog's readership
ACM Transactions on the Web (TWEB)
News thread extraction based on topical n-gram model with a background distribution
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Identifying sentiments over N-gram
Proceedings of the 21st international conference companion on World Wide Web
Topic-Based Hierarchical Segmentation
IEEE Transactions on Audio, Speech, and Language Processing
TM-LDA: efficient online modeling of latent topic transitions in social media
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining contentions from discussions and debates
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
How text segmentation algorithms gain from topic models
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
TopicTiling: a text segmentation algorithm based on LDA
ACL '12 Proceedings of ACL 2012 Student Research Workshop
A phrase-discovering topic model using hierarchical Pitman-Yor processes
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
The generalized dirichlet distribution in enhanced topic detection
Proceedings of the 21st ACM international conference on Information and knowledge management
An n-gram topic model for time-stamped documents
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Journal of Information Science
Hi-index | 0.00 |
We present a new unsupervised topic discovery model for a collection of text documents. In contrast to the majority of the state-of-the-art topic models, our model does not break the document's structure such as paragraphs and sentences. In addition, it preserves word order in the document. As a result, it can generate two levels of topics of different granularity, namely, segment-topics and word-topics. In addition, it can generate n-gram words in each topic. We also develop an approximate inference scheme using Gibbs sampling method. We conduct extensive experiments using publicly available data from different collections and show that our model improves the quality of several text mining tasks such as the ability to support fine grained topics with n-gram words in the correlation graph, the ability to segment a document into topically coherent sections, document classification, and document likelihood estimation.