Learning classifiers when the training data is not IID

  • Authors:
  • Murat Dundar;Balaji Krishnapuram;Jinbo Bi;R. Bharat Rao

  • Affiliations:
  • Computer Aided Diagnosis & Therapy Group, Siemens Medical Solutions, Malvern, PA;Computer Aided Diagnosis & Therapy Group, Siemens Medical Solutions, Malvern, PA;Computer Aided Diagnosis & Therapy Group, Siemens Medical Solutions, Malvern, PA;Computer Aided Diagnosis & Therapy Group, Siemens Medical Solutions, Malvern, PA

  • Venue:
  • IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Most methods for classifier design assume that the training samples are drawn independently and identically from an unknown data generating distribution, although this assumption is violated in several real life problems. Relaxing this i.i.d. assumption, we consider algorithms from the statistics literature for the more realistic situation where batches or sub-groups of training samples may have internal correlations, although the samples from different batches may be considered to be uncorrelated. Next, we propose simpler (more efficient) variants that scale well to large datasets; theoretical results from the literature are provided to support their validity. Experimental results from real-life computer aided diagnosis (CAD) problems indicate that relaxing the i.i.d. assumption leads to statistically significant improvements in the accuracy of the learned classifier. Surprisingly, the simpler algorithm proposed here is experimentally found to be even more accurate than the original version.