Single-shot detection of multiple categories of text using parametric mixture models

Authors:
Naonori Ueda;Kazumi Saito
Affiliations:
2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto Japan;2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto Japan
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 10
Cited 12

Automatic text processing

Automatic text processing
The nature of statistical learning theory

The nature of statistical learning theory
Neural networks for pattern recognition

Neural networks for pattern recognition
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning

Knowledge discovery of multiple-topic document using parametric mixture model with dirichlet prior

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting shared subspace for multi-label classification

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective multi-label active learning for text classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A shared-subspace learning framework for multi-label classification

ACM Transactions on Knowledge Discovery from Data (TKDD)
Learning incoherent sparse and low-rank patterns from multiple tasks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-label linear discriminant analysis

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Directed graph learning via high-order co-linkage analysis

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

ACM Transactions on Knowledge Discovery from Data (TKDD)
Human activity recognition based on surrounding things

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Cost effective depression patient thought record categorization via self-taught learning

Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments
Capturing correlations of multiple labels: A generative probabilistic model for multi-label learning

Neurocomputing
Self-taught learning via exponential family sparse coding for cost-effective patient thought record categorization

Personal and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the problem of detecting multiple topics or categories of text where each text is not assumed to belong to one of a number of mutually exclusive categories. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In this paper, we propose a more sophisticated approach to simultaneously detect multiple categories of text using parametric mixture models (PMMs), newly presented in this paper. PMMs are probabilistic generative models for text that has multiple categories. Our PMMs are essentially different from the conventional mixture of multinomial distributions in the sense that in the former several basis multinomial parameters are mixed in the parameter space, while in the latter several multinomial components are mixed. We derive efficient learning algorithms for PMMs within the framework of the maximum a posteriori estimate. We also empirically show that our method can outperform the conventional binary approach when applied to multitopic detection of World Wide Web pages, focusing on those from the "yahoo.com" domain.