Machine Learning
Combination of Multiple Classifiers Using Local Accuracy Estimates
IEEE Transactions on Pattern Analysis and Machine Intelligence
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Computational Statistics & Data Analysis - Nonlinear methods and data mining
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
A Theoretical and Experimental Analysis of Linear Combiners for Multiple Classifier Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiclass Boosting for Weak Classifiers
The Journal of Machine Learning Research
Adapted One-versus-All Decision Trees for Data Stream Classification
IEEE Transactions on Knowledge and Data Engineering
Diversity in Combinations of Heterogeneous Classifiers
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
New ensemble methods for evolving data streams
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Active Learning with Adaptive Heterogeneous Ensembles
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
The Journal of Machine Learning Research
On the Dual Formulation of Boosting Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
The nature of data streams requires classification algorithms to be real-time, efficient, and able to cope with high-dimensional data that are continuously arriving. It is a known fact that in high-dimensional datasets, not all features are critical for training a classifier. To improve the performance of data stream classification, we propose an algorithm called HEFT-Stream (H eterogeneous E nsemble with F eature drifT for Data Streams ) that incorporates feature selection into a heterogeneous ensemble to adapt to different types of concept drifts. As an example of the proposed framework, we first modify the FCBF [13] algorithm so that it dynamically update the relevant feature subsets for data streams. Next, a heterogeneous ensemble is constructed based on different online classifiers, including Online Naive Bayes and CVFDT [5]. Empirical results show that our ensemble classifier outperforms state-of-the-art ensemble classifiers (AWE [15] and OnlineBagging [21]) in terms of accuracy, speed, and scalability. The success of HEFT-Stream opens new research directions in understanding the relationship between feature selection techniques and ensemble learning to achieve better classification performance.