Over-Sampling from an auxiliary domain

Authors:
Samir Al-Stouhi;Abhilash Pandya
Affiliations:
Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI;Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI
Venue:
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Year:
2012

Citing 13
Cited 0

An introduction to computational learning theory

An introduction to computational learning theory
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Co-clustering based classification for out-of-domain documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
The weighted majority algorithm

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Rare category analysis

Rare category analysis
Adaptive boosting for transfer learning using dynamic updates

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The exponential growth of data dimensions presents an obstacle in informatics as data miners try to construct ever greater training sets to overcome the theoretical limitations of statistical learning theory. Machine learning models require a minimum set of samples within each label to develop a representative hypothesis. To overcome these bounds, we developed an algorithm that can extract samples from an auxiliary domain to augment the training set. Our work exploits concepts from the "Transfer Learning" and "Imbalanced Learning" domains to expand the training set and permit standard models to be applied. We present theoretical verification of our method and demonstrate the effectiveness of our framework with experimental results on real-world data.