Topic analysis using a finite mixture model

Authors:
Hang Li;Kenji Yamanishi
Affiliations:
Microsoft Research Asia, 5F Sigma Center, No. 49 Zhichun Road, Haidian District, Beijing, China and Internet Systems Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae-ku, Kawasaki 21 ...;Internet Systems Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae-ku, Kawasaki 216-855, Japan
Venue:
Information Processing and Management: an International Journal
Year:
2003

Citing 16
Cited 8

Elements of information theory

Elements of information theory
Automatic text decomposition using text segments and text themes

Proceedings of the the seventh ACM conference on Hypertext
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Models for Text Segmentation

Machine Learning - Special issue on natural language learning
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Text classification using ESC-based stochastic decision lists

Information Processing and Management: an International Journal
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Document classification using a finite mixture model

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Text segmentation based on similarity between words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Statistical models for topic segmentation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Fisher information and stochastic complexity

IEEE Transactions on Information Theory

Tracking dynamics of topic trends using a finite mixture model

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Topic activation analysis for document streams based on document arrival rate and relevance

Proceedings of the 2005 ACM symposium on Applied computing
Using patterns of thematic progression for building a table of contents of a text

Natural Language Engineering
Instanced-Based Mapping between Thesauri and Folksonomies

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior

Advanced Web and NetworkTechnologies, and Applications
Social Semantics and Its Evaluation by Means of Semantic Relatedness and Open Topic Models

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Unsupervised topic detection model and its application in text categorization

Proceedings of the CUBE International Information Technology Conference
Gem-based entity-knowledge maintenance

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Addressed here is the issue of 'topic analysis' which is used to determine a text's topic structure, a representation indicating what topics are included in a text and how those topics change within the text. Topic analysis consists of two main tasks: topic identification and text segmentation. While topic analysis would be extremely useful in a variety of text processing applications, no previous study has so far sufficiently addressed it. A statistical learning approach to the issue is proposed in this paper. More specifically, topics here are represented by means of word clusters, and a finite mixture model, referred to as a stochastic topic model (STM), is employed to represent a word distribution within a text. In topic analysis, a given text is segmented by detecting significant differences between STMs, and topics are identified by means of estimation of STMs. Experimental results indicate that the proposed method significantly outperforms methods that combine existing techniques.