Document classification using a finite mixture model

Authors:
Hang Li;Kenji Yamanishi
Affiliations:
C&C Res. Labs., NEC, Miyazaki Miyamae-ku Kawasaki, Japan;C&C Res. Labs., NEC, Miyazaki Miyamae-ku Kawasaki, Japan
Venue:
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Year:
1997

Citing 14
Cited 13

A probability distribution model for information retrieval

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Models for retrieval with probabilistic indexing

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Poor estimates of context are worse than none

HLT '90 Proceedings of the workshop on Speech and Natural Language
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of new and old algorithms for a mixture estimation problem

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
A randomized approximation of the MDL for stochastic models with hidden variables

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Document classification by machine: theory and practice

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2

Text classification using ESC-based stochastic decision lists

Proceedings of the eighth international conference on Information and knowledge management
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Text classification using ESC-based stochastic decision lists

Information Processing and Management: an International Journal
Topic analysis using a finite mixture model

Information Processing and Management: an International Journal
Improving text categorization using the importance of sentences

Information Processing and Management: an International Journal
Automatic text categorization by unsupervised learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Dominant meanings classification model for web information

Design and application of hybrid intelligent systems
Topic analysis using a finite mixture model

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Automatic classification of web pages into bookmark categories

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Time, topic and trawl: stories about how we reach our past

Proceedings of the Designing Interactive Systems Conference
Contextual and active learning-based affect-sensing from virtual drama improvisation

ACM Transactions on Speech and Language Processing (TSLP)
Machine learning using Bernoulli mixture models: Clustering, rule extraction and dimensionality reduction

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new method of classifying documents into categories. We define for each category a finite mixture model based on soft clustering of words. We treat the problem of classifying documents as that of conducting statistical hypothesis testing over finite mixture models, and employ the EM algorithm to efficiently estimate parameters in a finite mixture model. Experimental results indicate that our method outperforms existing methods.