Learning Classification with Both Labeled and Unlabeled Data

Authors:
Jean-Noël Vittaut;Massih-Reza Amini;Patrick Gallinari
Affiliations:
-;-;-
Venue:
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Year:
2002

Citing 18
Cited 5

A Classification EM algorithm for clustering and two stochastic versions

Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Machine learning of generic and user-focused summarization

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Extracting sentence segments for text summarization: a machine learning approach

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
The use of unlabeled data to improve supervised learning for text summarization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Improving Short-Text Classification using Unlabeled Data for Classification Problems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic Text Summarization Using Unsupervised and Semi-supervised Learning

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Learning for Text Summarization Using Labeled and Unlabeled Sentences

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
The automatic creation of literature abstracts

IBM Journal of Research and Development

Semi-supervised learning with an imperfect supervisor

Knowledge and Information Systems
A boosting algorithm for learning bipartite ranking functions with partially labeled data

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Constrained parameter estimation for semi-supervised learning: the case of the nearest mean classifier

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Learning aspect models with partially labeled data

Pattern Recognition Letters
Semi-supervised linear discriminant analysis through moment-constraint parameter estimation

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

A key difficulty for applying machine learning classification algorithms for many applications is that they require a lot of hand-labeled examples. Labeling large amount of data is a costly process which in many cases is prohibitive. In this paper we show how the use of a small number of labeled data together with a large number of unlabeled data can create high-accuracy classifiers. Our approach does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning. We propose new discriminant algorithms handling both labeled and unlabeled data for training classification models and we analyze their performances on different information access problems ranging from text span classification for text summarization to e-mail spam detection and text classification.