Can irrelevant data help semi-supervised learning, why and how?

Authors:
Haiqin Yang;Shenghuo Zhu;Irwin King;Michael R. Lyu
Affiliations:
The Chinese University of Hong Kong, Hong Kong, Hong Kong;NEC Laboratories America, Cupertino, CA, USA;The Chinese University of Hong Kong & AT&T Labs Research, Hong Kong & San Francisco, Hong Kong;The Chinese University of Hong Kong, Hong Kong , Hong Kong
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 19
Cited 0

An OL(n3) primal interior point algorithm for convex quadratic programming

Mathematical Programming: Series A and B
The nature of statistical learning theory

The nature of statistical learning theory
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
The concave-convex procedure

Neural Computation
Trading convexity for scalability

ICML '06 Proceedings of the 23rd international conference on Machine learning
Inference with the Universum

ICML '06 Proceedings of the 23rd international conference on Machine learning
Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics)

Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics)
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Large Scale Transductive SVMs

The Journal of Machine Learning Research
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Semi-supervised spam filtering: does it work?

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Semi-supervised Learning from General Unlabeled Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
On Efficient Large Margin Semisupervised Learning: Method and Theory

The Journal of Machine Learning Research
Semi-supervised learning with very few labeled training examples

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Semi-supervised learning by disagreement

Knowledge and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous semi-supervised learning (SSL) techniques usually assume unlabeled data are relevant to the target task. That is, they follow the same distribution as the targeted labeled data. In this paper, we address a different and very difficult scenario in SSL, where the unlabeled data may be a mixture of data relevant or irrelevant to the target binary classification task. In our framework, we do not require explicitly prior knowledge on the relatedness of the unlabeled data to the target data. In order to alleviate the effect of the irrelevant unlabeled data and utilize the implicit knowledge among all available data, we develop a novel maximum margin classifier, named the tri-class support vector machine (3C-SVM), to seek an inductive rule to separate the target binary classification task well while finding out the irrelevant data by-product. To attain this goal, we introduce a new min loss function, which can relieve the impact of the irrelevant data while relying more on the labeled data and the relevant unlabeled data. This loss function can therefore achieve the maximum entropy principle. The 3C-SVM can then generalize standard SVMs, Semi-supervised SVMs, and SVMs learned from the universum as its special cases. We further analyze the property of 3C-SVM on why the irrelevant data can help to improve the model performance. For implementation, we make relaxation and approximate the objective by the convex-concave procedure, which turns the original optimization from integral programming problem to a problem by just solving a finite number of quadratic programming problems. Empirical results are reported to demonstrate the advantages of our 3C-SVM model.