A maximum entropy approach to natural language processing
Computational Linguistics
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A hybrid generative/discriminative approach to semi-supervised classifier design
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
A classifier design based on combining multiple components by maximum entropy principle
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Hi-index | 0.00 |
This paper presents a method for designing a semisupervised classifier for multi-component data such as web pages consisting of text and link information. The proposed method is based on a hybrid of generative and discriminative approaches to take advantage of both approaches. With our hybrid approach, for each component, we consider an individual generative model trained on labeled samples and a model introduced to reduce the effect of the bias that results when there are few labeled samples. Then, we construct a hybrid classifier by combining all the models based on the maximum entropy principle. In our experimental results using three test collections such as web pages and technical papers, we confirmed that our hybrid approach was effective in improving the generalization performance of multi-component data classification.