C4.5: programs for machine learning
C4.5: programs for machine learning
The impact of changing populations on classifier performance
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology
Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology
Incremental Induction of Decision Trees
Machine Learning
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Incremental Fuzzy Decision Trees
KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A framework for diagnosing changes in evolving data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
STREAM: the stanford stream data manager (demonstration description)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient decision tree construction on streaming data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM SIGMOD Record
An Efficient Classification System Based on Binary Search Trees for Data Streams Mining
ICONS '07 Proceedings of the Second International Conference on Systems
Online classification of nonstationary data streams
Intelligent Data Analysis
Detecting change in data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Fuzzy decision trees: issues and methods
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
An Incremental Fuzzy Decision Tree Classification Method for Mining Data Streams
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
A new fuzzy classifier for data streams
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Hi-index | 0.00 |
One of most important algorithms for mining data streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. In this paper, we revisit this problem and implemented a system VFDTt on top of VFDT and VFDTc. We make the following three contributions: 1) we present a threaded binary search trees (TBST) approach for efficiently handling continuous attributes. It builds a threaded binary search tree, and its processing time for values inserting is O(nlogn), while VFDT's processing time is O(n$sup2$esup). When a new example arrives, VFDTc need update O(logn) attribute tree nodes, but VFDTt just need update one necessary node.2) we improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it improves from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, VFDTt's candidate split-test number decrease from O(n) to O(logn). Comparing to VFDT, the most relevant property of our system is an average reduction of 25.53% in processing time, while keep the same tree size and accuracy. Overall, the techniques introduced here significantly improve the efficiency of decision tree classification on data streams.