Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds
IEEE Transactions on Pattern Analysis and Machine Intelligence
LungCAD: a clinically approved, machine learning system for lung cancer detection
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Mean field variational Bayesian inference for support vector machine classification
Computational Statistics & Data Analysis
Hi-index | 0.01 |
Most methods for classifier design assume that the training samples are drawn independently and identically from an unknown data generating distribution, although this assumption is violated in several real life problems. Relaxing this i.i.d. assumption, we consider algorithms from the statistics literature for the more realistic situation where batches or sub-groups of training samples may have internal correlations, although the samples from different batches may be considered to be uncorrelated. Next, we propose simpler (more efficient) variants that scale well to large datasets; theoretical results from the literature are provided to support their validity. Experimental results from real-life computer aided diagnosis (CAD) problems indicate that relaxing the i.i.d. assumption leads to statistically significant improvements in the accuracy of the learned classifier. Surprisingly, the simpler algorithm proposed here is experimentally found to be even more accurate than the original version.