The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Ensemble Methods in Machine Learning
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Logistic regression for data mining and high-dimensional classification
Logistic regression for data mining and high-dimensional classification
Making Logistic Regression a Core Data Mining Tool with TR-IRLS
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Trust region Newton methods for large-scale logistic regression
Proceedings of the 24th international conference on Machine learning
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Random Forests for multiclass classification: Random MultiNomial Logit
Expert Systems with Applications: An International Journal
Pro Hadoop
Hadoop: The Definitive Guide
Hadoop in Action
Hi-index | 0.00 |
Although massive, high dimensional data in the real world provide more information for logistic regression classification, yet it also means a huge challenge for us to build models accurately and efficiently. In this paper, we propose a scalable subspace logistic regression algorithm. It can be viewed as an advanced classification algorithm based on a random subspace sampling method and the traditional logistic regression algorithm, aiming to effectively deal with massive, high dimensional data. Our algorithm is particularly suitable for distributed computing environment, which we have proved, and it is implemented on Hadoop platform with MapReduce programming framework in practice. We have done several experiments using real and synthetic datasets and demonstrated better performance of our algorithm in comparison with other logistic regression algorithms.