Machine Learning
Shape quantization and recognition with randomized trees
Neural Computation
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Use of the zero norm with linear models and kernel methods
The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
A Statistical Approach to Texture Classification from Single Images
International Journal of Computer Vision - Special Issue on Texture Analysis and Synthesis
Randomized Trees for Real-Time Keypoint Recognition
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Machine Learning
CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Keypoint Recognition Using Randomized Trees
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Comparison of Decision Tree Ensemble Creation Techniques
IEEE Transactions on Pattern Analysis and Machine Intelligence
Optimized stratified sampling for approximate query processing
ACM Transactions on Database Systems (TODS)
The Journal of Machine Learning Research
Journal of Cognitive Neuroscience
SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis
IEEE Transactions on Knowledge and Data Engineering
An empirical evaluation of supervised learning in high dimensions
Proceedings of the 25th international conference on Machine learning
A stratified traffic sampling methodology for seeing the big picture
Computer Networks: The International Journal of Computer and Telecommunications Networking
Randomized Clustering Forests for Image Classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Consistency of Random Forests and Other Averaging Classifiers
The Journal of Machine Learning Research
Bioinformatics
Bioinformatics
Stratified Sampling for Data Mining on the Deep Web
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Sampling strategies for bag-of-features image classification
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Statistical modeling and conceptualization of visual patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
An ensemble of decision cluster crotches for classification of high dimensional data
Knowledge-Based Systems
Hi-index | 0.01 |
For high dimensional data a large portion of features are often not informative of the class of the objects. Random forest algorithms tend to use a simple random sampling of features in building their decision trees and consequently select many subspaces that contain few, if any, informative features. In this paper we propose a stratified sampling method to select the feature subspaces for random forests with high dimensional data. The key idea is to stratify features into two groups. One group will contain strong informative features and the other weak informative features. Then, for feature subspace selection, we randomly select features from each group proportionally. The advantage of stratified sampling is that we can ensure that each subspace contains enough informative features for classification in high dimensional data. Testing on both synthetic data and various real data sets in gene classification, image categorization and face recognition data sets consistently demonstrates the effectiveness of this new method. The performance is shown to better that of state-of-the-art algorithms including SVM, the four variants of random forests (RF, ERT, enrich-RF, and oblique-RF), and nearest neighbor (NN) algorithms.