The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
C4.5: programs for machine learning
C4.5: programs for machine learning
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
An introduction to spatial database systems
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Similarity Indexing with the SS-tree
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Incremental Support Vector Machine Construction
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Multidimensional Access Methods: Trees Have Grown Everywhere
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
StreaMon: an adaptive engine for stream query processing
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Rule extraction from linear support vector machines
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Using additive expert ensembles to cope with concept drift
ICML '05 Proceedings of the 22nd international conference on Machine learning
Data Streams: Models and Algorithms (Advances in Database Systems)
Data Streams: Models and Algorithms (Advances in Database Systems)
Ensemble Pruning Via Semi-definite Programming
The Journal of Machine Learning Research
Near-optimal algorithms for shared filter evaluation in data stream systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Categorizing and mining concept drifting data streams
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Scalable ranked publish/subscribe
Proceedings of the VLDB Endowment
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Identifying suspicious URLs: an application of large-scale online learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
New ensemble methods for evolving data streams
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the VLDB Endowment
Ensemble pruning via individual contribution ordering
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust ensemble learning for mining noisy data streams
Decision Support Systems
Active learning from stream data using optimal weight classifier ensemble
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
The ClusTree: indexing micro-clusters for anytime stream mining
Knowledge and Information Systems
Predictive Data Stream Filtering
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Mining frequent patterns across multiple data streams
Proceedings of the 20th ACM international conference on Information and knowledge management
Continuous data stream query in the cloud
Proceedings of the 20th ACM international conference on Information and knowledge management
Group detection and relation analysis research for web social network
APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Hi-index | 0.00 |
Ensemble learning has become a common tool for data stream classification, being able to handle large volumes of stream data and concept drifting. Previous studies focus on building accurate prediction models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can automatically update themselves by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Experiments on both synthetic and real-world data streams demonstrate the performance of our approach.