The nature of statistical learning theory
The nature of statistical learning theory
Semi-supervised support vector machines
Proceedings of the 1998 conference on Advances in neural information processing systems II
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Mining and Knowledge Discovery
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Label propagation through linear neighborhoods
ICML '06 Proceedings of the 23rd international conference on Machine learning
The Journal of Machine Learning Research
Large scale manifold transduction
Proceedings of the 25th international conference on Machine learning
Prototype vector machine for large scale semi-supervised learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.09 |
In recent years, semi-supervised learning algorithms have aroused considerable interests from machine learning fields because unlabeled samples are often readily available and labeled ones are expensive to obtain. Graph-based semi-supervised learning has been one of the most active research areas. However, how to speed up these methods for handling large scale datasets is still a challenge. In this paper, we apply the clustering feature tree to large scale graph-based semi-supervised learning and propose a local learning integrating global structure algorithm. By organizing the unlabeled samples with a clustering feature tree, it allows us to decompose the unlabeled samples to a series of clusters (sub-trees) and learn them locally. In each training process on sub-trees, the clustering centers are chosen as frame points to keep the global structure of input samples, and propagate their labels to unlabeled data. We compare our method with several existing large scale algorithms on real-world datasets. The experiments show the scalability and accuracy improvement of our proposed approach. It can also handle millions of samples efficiently.