A probabilistic model for text categorization: based on a single random variable with multiple values

Authors:
Makoto Iwayama;Takenobu Tokunaga
Affiliations:
Advanced Research Laboratory, Hitachi Ltd., Hatoyama, Saitama, Japan;Tokyo Institute of Technology, Ôokayama, Meguro-Ku, Tokyo, Japan
Venue:
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Year:
1994

Citing 9
Cited 11

A framework for effective retrieval

ACM Transactions on Database Systems (TODS)
A probability distribution model for information retrieval

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Models for retrieval with probabilistic indexing

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Experiments with a component theory of probabilistic information retrieval based on single terms as document components

ACM Transactions on Information Systems (TOIS)
Self-organized language modeling for speech recognition

Readings in speech recognition
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing

Cluster-based text categorization: a comparison of category search strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Background Information in Knowledge Discovery from Text

Journal of Intelligent Information Systems
Mining Text Using Keyword Distributions

Journal of Intelligent Information Systems
A Study of Bayesian Clustering of a Document Set Based on GA

SEAL'98 Selected papers from the Second Asia-Pacific Conference on Simulated Evolution and Learning on Simulated Evolution and Learning
Case studies: Commercial domain, single mining tasks systems: document explorer

Handbook of data mining and knowledge discovery
Minimizing manual annotation cost in supervised training from corpora

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Organizing encyclopedic knowledge based on the web and its application to question answering

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Question answering using encyclopedic knowledge generated from the web

ODQA '01 Proceedings of the workshop on Open-domain question answering - Volume 12
A comparison of text-classification techniques applied to Arabic text

Journal of the American Society for Information Science and Technology
Automatic thesaurus construction based on grammatical relations

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Hierarchical Bayesian clustering for automatic text classification

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorization is the classification of documents with respect to a set of predefined categories. In this paper, we propose a new probabilistic model for text categorization, that is based on a Single random Variable with Multiple Values (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers term weighting for target documents, and 3) is less affected by having insufficient training cases. We verify our model's superiority over the others in the task of categorizing news articles from the "Wall Street Journal".