Topic Extraction from Text Documents Using Multiple-Cause Networks

Authors:
Jeong-Ho Chang;Jae Won Lee;Yuseop Kim;Byoung-Tak Zhang
Affiliations:
-;-;-;-
Venue:
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Year:
2002

Citing 6
Cited 3

A multiple cause mixture model for unsupervised learning

Neural Computation
The Helmholtz machine

Neural Computation
Graphical models for machine learning and digital communication

Graphical models for machine learning and digital communication
Using Helmholtz machines to analyze multi-channel neuronal recordings

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Competition and multiple cause models

Neural Computation

Web page feature selection and classification using neural networks

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
Nonsmooth Nonnegative Matrix Factorization (nsNMF)

IEEE Transactions on Pattern Analysis and Machine Intelligence
SemaFor: semantic document indexing using semantic forests

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an approach to the topic extraction from text documents using probabilistic graphical models. Multiple-cause networks with latent variables are used and the Helmholtz machines are utilized to ease the learning and inference. The learning in this model is conducted in a purely data-driven way and does not require prespecified categories of the given documents. Topic words extraction experiments on the TDT-2collection are presented. Especially, document clustering results on a subset of TREC-8 ad-hoc task data show the substantial reduction of the inference time without significant deterioration of performance.