A hybrid generative/discriminative approach to text classification with additional information

Authors:
Akinori Fujino;Naonori Ueda;Kazumi Saito
Affiliations:
NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan;NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan;NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan
Venue:
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Year:
2007

Citing 9
Cited 5

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A maximum entropy approach to natural language processing

Computational Linguistics
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Web classification using support vector machine

Proceedings of the 4th international workshop on Web information and data management
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

Interpretation of hybrid generative/discriminative algorithms

Neurocomputing
On the generative-discriminative tradeoff approach: Interpretation, asymptotic efficiency and classification performance

Computational Statistics & Data Analysis
Exponential family hybrid semi-supervised learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A framework of automatic subject term assignment for text categorization: An indexing conception-based approach

Journal of the American Society for Information Science and Technology
Ensemble of feature sets and classification algorithms for sentiment classification

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a classifier for text data samples consisting of main text and additional components, such as Web pages and technical papers. We focus on multiclass and single-labeled text classification problems and design the classifier based on a hybrid composed of probabilistic generative and discriminative approaches. Our formulation considers individual component generative models and constructs the classifier by combining these trained models based on the maximum entropy principle. We use naive Bayes models as the component generative models for the main text and additional components such as titles, links, and authors, so that we can apply our formulation to document and Web page classification problems. Our experimental results for four test collections confirmed that our hybrid approach effectively combined main text and additional components and thus improved classification performance.