Scalable subspace logistic regression models for high dimensional data

Authors:
Shuang Wang;Xiaojun Chen;Joshua Zhexue Huang;Shengzhong Feng
Affiliations:
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China;Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China;Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China;Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Venue:
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Year:
2012

Citing 10
Cited 0

The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Logistic regression for data mining and high-dimensional classification

Logistic regression for data mining and high-dimensional classification
Making Logistic Regression a Core Data Mining Tool with TR-IRLS

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Trust region Newton methods for large-scale logistic regression

Proceedings of the 24th international conference on Machine learning
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Random Forests for multiclass classification: Random MultiNomial Logit

Expert Systems with Applications: An International Journal
Pro Hadoop

Pro Hadoop
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Hadoop in Action

Hadoop in Action

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although massive, high dimensional data in the real world provide more information for logistic regression classification, yet it also means a huge challenge for us to build models accurately and efficiently. In this paper, we propose a scalable subspace logistic regression algorithm. It can be viewed as an advanced classification algorithm based on a random subspace sampling method and the traditional logistic regression algorithm, aiming to effectively deal with massive, high dimensional data. Our algorithm is particularly suitable for distributed computing environment, which we have proved, and it is implemented on Hadoop platform with MapReduce programming framework in practice. We have done several experiments using real and synthetic datasets and demonstrated better performance of our algorithm in comparison with other logistic regression algorithms.