A novel virtual sample generation method based on Gaussian distribution

  • Authors:
  • Jing Yang;Xu Yu;Zhi-Qiang Xie;Jian-Pei Zhang

  • Affiliations:
  • College of Computer Science and Technology, Harbin Engineering University, Harbin, China;College of Computer Science and Technology, Harbin Engineering University, Harbin, China;College of Computer Science and Technology, Harbin Engineering University, Harbin, China and College of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China;College of Computer Science and Technology, Harbin Engineering University, Harbin, China

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional machine learning algorithms are not with satisfying generalization ability on noisy, imbalanced, and small sample training set. In this work, a novel virtual sample generation (VSG) method based on Gaussian distribution is proposed. Firstly, the method determines the mean and the standard error of Gaussian distribution. Then, virtual samples can be generated by such Gaussian distribution. Finally, a new training set is constructed by adding the virtual samples to the original training set. This work has shown that training on the new training set is equivalent to a form of regularization regarding small sample problems, or cost-sensitive learning regarding imbalanced sample problems. Experiments show that given a suitable number of virtual sample replicates, the generalization ability of the classifiers on the new training sets can be better than that on the original training sets.