Scalable subspace logistic regression models for high dimensional data

  • Authors:
  • Shuang Wang;Xiaojun Chen;Joshua Zhexue Huang;Shengzhong Feng

  • Affiliations:
  • Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China;Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China;Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China;Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China

  • Venue:
  • APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although massive, high dimensional data in the real world provide more information for logistic regression classification, yet it also means a huge challenge for us to build models accurately and efficiently. In this paper, we propose a scalable subspace logistic regression algorithm. It can be viewed as an advanced classification algorithm based on a random subspace sampling method and the traditional logistic regression algorithm, aiming to effectively deal with massive, high dimensional data. Our algorithm is particularly suitable for distributed computing environment, which we have proved, and it is implemented on Hadoop platform with MapReduce programming framework in practice. We have done several experiments using real and synthetic datasets and demonstrated better performance of our algorithm in comparison with other logistic regression algorithms.