Sublinear algorithms for penalized logistic regression in massive datasets

  • Authors:
  • Haoruo Peng;Zhengyu Wang;Edward Y. Chang;Shuchang Zhou;Zhihua Zhang

  • Affiliations:
  • Google Research Beijing, Beijing, China, Department of Computer Science and Technology, Tsinghua University, Beijing, China;Google Research Beijing, Beijing, China, Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China;Google Research Beijing, Beijing, China;Google Research Beijing, Beijing, China;Google Research Beijing, Beijing, China, College of Computer Science and Technology, Zhejiang University, Zhejiang, China

  • Venue:
  • ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Penalized logistic regression (PLR) is a widely used supervised learning model. In this paper, we consider its applications in large-scale data problems and resort to a stochastic primal-dual approach for solving PLR. In particular, we employ a random sampling technique in the primal step and a multiplicative weights method in the dual step. This technique leads to an optimization method with sublinear dependency on both the volume and dimensionality of training data. We develop concrete algorithms for PLR with ℓ2-norm and ℓ1-norm penalties, respectively. Experimental results over several large-scale and high-dimensional datasets demonstrate both efficiency and accuracy of our algorithms.