Learning concepts from large scale imbalanced data sets using support cluster machines

Authors:
Jinhui Yuan;Jianmin Li;Bo Zhang
Affiliations:
Tsinghua University, Beijing, P. R. China;Tsinghua University, Beijing, P. R. China;Tsinghua University, Beijing, P. R. China
Venue:
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Year:
2006

Citing 19
Cited 12

Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Support Vector Machines for Classification in Nonstandard Situations

Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Sparse Greedy Matrix Approximation for Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
One-class svms for document classification

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Multimodal concept-dependent active learning for image retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution

IEEE Transactions on Knowledge and Data Engineering
A fast kernel-based multilevel algorithm for graph clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Content-based image retrieval: approaches and trends of the new age

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing

Data Mining and Knowledge Discovery
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Representative sampling for text classification using support vector machines

ECIR'03 Proceedings of the 25th European conference on IR research
Lessons for the future from a decade of informedia video analysis research

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval

Support cluster machine

Proceedings of the 24th international conference on Machine learning
Optimizing training set construction for video semantic classification

EURASIP Journal on Advances in Signal Processing
Video semantic analysis based on structure-sensitive anisotropic manifold ranking

Signal Processing
Block-quantized support vector ordinal regression

IEEE Transactions on Neural Networks
AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets

Proceedings of the international conference on Multimedia information retrieval
RAMOBoost: ranked minority oversampling in boosting

IEEE Transactions on Neural Networks
Multimedia news exploration and retrieval by integrating keywords, relations and visual features

Multimedia Tools and Applications
Borderline over-sampling for imbalanced data classification

International Journal of Knowledge Engineering and Soft Data Paradigms
Using the leader algorithm with support vector machines for large data sets

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Event retrieval in video archives using rough set theory and partially supervised learning

Multimedia Tools and Applications
Weighted extreme learning machine for imbalance learning

Neurocomputing
A new probabilistic active sample selection algorithm for class imbalance problem

International Journal of Knowledge Engineering and Soft Data Paradigms

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper considers the problem of using Support Vector Machines (SVMs) to learn concepts from large scale imbalanced data sets. The objective of this paper is twofold. Firstly, we investigate the effects of large scale and imbalance on SVMs. We highlight the role of linear non-separability in this problem. Secondly, we develop a both practical and theoretical guaranteed meta-algorithm to handle the trouble of scale and imbalance. The approach is named Support Cluster Machines (SCMs). It incorporates the informative and the representative under-sampling mechanisms to speedup the training procedure. The SCMs differs from the previous similar ideas in two ways, (a) the theoretical foundation has been provided, and (b) the clustering is performed in the feature space rather than in the input space. The theoretical analysis not only provides justification, but also guides the technical choices of the proposed approach. Finally, experiments on both the synthetic and the TRECVID data are carried out. The results support the previous analysis and show that the SCMs are efficient and effective while dealing with large scale imbalanced data sets.