Explicit learning curves for transduction and application to clustering and compression algorithms

Authors:
Philip Derbeko;Ran El-Yaniv;Ron Meir
Affiliations:
Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel;Department of Electrical Engineering, Technion - Israel Institute of Technology, Haifa, Israel;Department of Electrical Engineering, Technion - Israel Institute of Technology, Haifa, Israel
Venue:
Journal of Artificial Intelligence Research
Year:
2004

Citing 13
Cited 11

Elements of information theory

Elements of information theory
The nature of statistical learning theory

The nature of statistical learning theory
PAC-Bayesian model averaging

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Some PAC-Bayesian Theorems

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
AI Game Programming Wisdom

AI Game Programming Wisdom
PAC-Bayesian Stochastic Model Selection

Machine Learning
Large Margin Trees for Induction and Transduction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning the Kernel Matrix with Semi-Definite Programming

Learning the Kernel Matrix with Semi-Definite Programming
Online Choice of Active Learning Algorithms

The Journal of Machine Learning Research
An objective evaluation criterion for clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Effective transductive learning via objective model selection

Pattern Recognition Letters
An analysis of graph cut size for transductive learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Large Scale Transductive SVMs

The Journal of Machine Learning Research
Large margin vs. large volume in transductive learning

Machine Learning
Lexicon acquisition for dialectal Arabic using transductive learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Transductive Rademacher complexity and its applications

Journal of Artificial Intelligence Research
Transductive rademacher complexity and its applications

COLT'07 Proceedings of the 20th annual conference on Learning theory
PAC-Bayesian Analysis of Co-clustering and Beyond

The Journal of Machine Learning Research
Stable transductive learning

COLT'06 Proceedings of the 19th annual conference on Learning Theory
k nearest neighbor using ensemble clustering

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
An ensemble-clustering-based distance metric and its applications

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik's basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PAC-Bayesian version of Vapnik's transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space.