A statistical model for topic segmentation and clustering

Authors:
M. Mahdi Shafiei;Evangelos E. Milios
Affiliations:
Faculty of Computer Science, Dalhousie University;Faculty of Computer Science, Dalhousie University
Venue:
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Year:
2008

Citing 12
Cited 4

Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
Latent dirichlet allocation

The Journal of Machine Learning Research
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Model-based overlapping clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Topic modeling: beyond bag-of-words

ICML '06 Proceedings of the 23rd international conference on Machine learning
The Wikipedia XML corpus

ACM SIGIR Forum
Latent Dirichlet Co-Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Unsupervised topic modelling for multi-party spoken discourse

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Minimum cut model for spoken lecture segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Operations for learning with graphical models

Journal of Artificial Intelligence Research

A statistical model for topically segmented documents

DS'11 Proceedings of the 14th international conference on Discovery science
Legal document clustering with built-in topic segmentation

Proceedings of the 20th ACM international conference on Information and knowledge management
An unsupervised topic segmentation model incorporating word order

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
On handling textual errors in latent document modeling

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a statistical model for discovering topical clusters of words in unstructured text. The model uses a hierarchical Bayesian structure and it is also able to identify segments of text which are topically coherent. The model is able to assign each segment to a particular topic and thus categorizes the corresponding document to potentially multiple topics. We present some initial results indicating that the word topics discovered by the proposed model are more consistent compared to other models. Our early experiments show that our model clustering performance compares well with other clustering models on a real text corpus, which do not provide topic segmentation. Segmentation performance of our model is also comparable to a recently proposed segmentation model which does not provide document clustering.