Local learning integrating global structure for large scale semi-supervised classification

Authors:
Guangchao Wu;Yuhan Li;Xiaowei Yang;Jianqing Xi
Affiliations:
-;-;-;-
Venue:
Computers & Mathematics with Applications
Year:
2013

Citing 9
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Label propagation through linear neighborhoods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Large Scale Transductive SVMs

The Journal of Machine Learning Research
Large scale manifold transduction

Proceedings of the 25th international conference on Machine learning
Prototype vector machine for large scale semi-supervised learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.09

Visualization

Abstract

In recent years, semi-supervised learning algorithms have aroused considerable interests from machine learning fields because unlabeled samples are often readily available and labeled ones are expensive to obtain. Graph-based semi-supervised learning has been one of the most active research areas. However, how to speed up these methods for handling large scale datasets is still a challenge. In this paper, we apply the clustering feature tree to large scale graph-based semi-supervised learning and propose a local learning integrating global structure algorithm. By organizing the unlabeled samples with a clustering feature tree, it allows us to decompose the unlabeled samples to a series of clusters (sub-trees) and learn them locally. In each training process on sub-trees, the clustering centers are chosen as frame points to keep the global structure of input samples, and propagate their labels to unlabeled data. We compare our method with several existing large scale algorithms on real-world datasets. The experiments show the scalability and accuracy improvement of our proposed approach. It can also handle millions of samples efficiently.