A new decision tree classification method for mining high-speed data streams based on threaded binary search trees

Authors:
Tao Wang;Zhoujun Li;Xiaohua Hu;Yuejin Yan;Huowang Chen
Affiliations:
Computer School, National University of Defense Technology, Changsha, China;School of Computer Science & Engineering, Beihang University, Beijing, China;College of Information Science and Technology, Drexel University, Philadelphia, PA;Computer School, National University of Defense Technology, Changsha, China;Computer School, National University of Defense Technology, Changsha, China
Venue:
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Year:
2007

Citing 23
Cited 3

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
The impact of changing populations on classifier performance

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology

Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology
Incremental Induction of Decision Trees

Machine Learning
Data streams: algorithms and applications

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Incremental Fuzzy Decision Trees

KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
STREAM: the stanford stream data manager (demonstration description)

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient decision tree construction on streaming data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data streams: a review

ACM SIGMOD Record
An Efficient Classification System Based on Binary Search Trees for Data Streams Mining

ICONS '07 Proceedings of the Second International Conference on Systems
Online classification of nonstationary data streams

Intelligent Data Analysis
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Fuzzy decision trees: issues and methods

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

An Incremental Fuzzy Decision Tree Classification Method for Mining Data Streams

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
A new fuzzy decision tree classification method for mining high-speed data streams based on binary search trees

FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
A new fuzzy classifier for data streams

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of most important algorithms for mining data streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. In this paper, we revisit this problem and implemented a system VFDTt on top of VFDT and VFDTc. We make the following three contributions: 1) we present a threaded binary search trees (TBST) approach for efficiently handling continuous attributes. It builds a threaded binary search tree, and its processing time for values inserting is O(nlogn), while VFDT's processing time is O(n$sup2$esup). When a new example arrives, VFDTc need update O(logn) attribute tree nodes, but VFDTt just need update one necessary node.2) we improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it improves from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, VFDTt's candidate split-test number decrease from O(n) to O(logn). Comparing to VFDT, the most relevant property of our system is an average reduction of 25.53% in processing time, while keep the same tree size and accuracy. Overall, the techniques introduced here significantly improve the efficiency of decision tree classification on data streams.