An unsupervised topic segmentation model incorporating word order

Authors:
Shoaib Jameel;Wai Lam
Affiliations:
The Chinese University of Hong Kong, Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong, Hong Kong
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 28
Cited 0

Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Topic segmentation with an aspect hidden Markov model

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
Latent dirichlet allocation

The Journal of Machine Learning Research
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Topic modeling: beyond bag-of-words

ICML '06 Proceedings of the 23rd international conference on Machine learning
Latent Dirichlet Co-Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Subsequence-Based Text Segmentation and Labeling

ETCS '09 Proceedings of the 2009 First International Workshop on Education Technology and Computer Science - Volume 01
Bayesian unsupervised topic segmentation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Text segmentation via topic modeling: an analytical study

Proceedings of the 18th ACM conference on Information and knowledge management
A statistical model for topic segmentation and clustering

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Discovery of latent subcommunities in a blog's readership

ACM Transactions on the Web (TWEB)
A segmented topic model based on the two-parameter Poisson-Dirichlet process

Machine Learning
News thread extraction based on topical n-gram model with a background distribution

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Identifying sentiments over N-gram

Proceedings of the 21st international conference companion on World Wide Web
Topic-Based Hierarchical Segmentation

IEEE Transactions on Audio, Speech, and Language Processing
TM-LDA: efficient online modeling of latent topic transitions in social media

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining contentions from discussions and debates

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
How text segmentation algorithms gain from topic models

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
TopicTiling: a text segmentation algorithm based on LDA

ACL '12 Proceedings of ACL 2012 Student Research Workshop
A phrase-discovering topic model using hierarchical Pitman-Yor processes

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
The generalized dirichlet distribution in enhanced topic detection

Proceedings of the 21st ACM international conference on Information and knowledge management
An n-gram topic model for time-stamped documents

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Topic segmentation model based on ATNLDA and co-occurrence theory and its application in stem cell field

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new unsupervised topic discovery model for a collection of text documents. In contrast to the majority of the state-of-the-art topic models, our model does not break the document's structure such as paragraphs and sentences. In addition, it preserves word order in the document. As a result, it can generate two levels of topics of different granularity, namely, segment-topics and word-topics. In addition, it can generate n-gram words in each topic. We also develop an approximate inference scheme using Gibbs sampling method. We conduct extensive experiments using publicly available data from different collections and show that our model improves the quality of several text mining tasks such as the ability to support fine grained topics with n-gram words in the correlation graph, the ability to segment a document into topically coherent sections, document classification, and document likelihood estimation.