A new fuzzy decision tree classification method for mining high-speed data streams based on binary search trees

Authors:
Zhoujun Li;Tao Wang;Ruoxue Wang;Yuejin Yan;Huowang Chen
Affiliations:
School of Computer Science & Engineering, Beihang University, Beijing, China;Computer School, National University of Defense Technology, Changsha, China;Journal of Computer Research and Development, Beijing, China;Computer School, National University of Defense Technology, Changsha, China;Computer School, National University of Defense Technology, Changsha, China
Venue:
FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Year:
2007

Citing 24
Cited 1

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
The impact of changing populations on classifier performance

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology

Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology
Incremental Induction of Decision Trees

Machine Learning
Data streams: algorithms and applications

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Incremental Fuzzy Decision Trees

KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
STREAM: the stanford stream data manager (demonstration description)

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient decision tree construction on streaming data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data streams: a review

ACM SIGMOD Record
An Efficient Classification System Based on Binary Search Trees for Data Streams Mining

ICONS '07 Proceedings of the Second International Conference on Systems
Online classification of nonstationary data streams

Intelligent Data Analysis
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A new decision tree classification method for mining high-speed data streams based on threaded binary search trees

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Fuzzy decision trees: issues and methods

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Building fast decision trees from large training sets

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision tree construction is a well-studied problem in data mining. Recently, there has been much interest in mining data streams. Domingos and Hulten have presented a one-pass algorithm for decision tree constructions. Their system using Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. Peng et al. present soft discretization method to solve continuous attributes in data mining. In this paper, we revisit these problems and implemented a system sVFDT for data stream mining. We make the following contributions: 1) we present a binary search trees (BST) approach for efficiently handling continuous attributes. Its processing time for values inserting is O(nlogn), while VFDT's processing time is O(n2). 2) We improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it decreases from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, sVFDT's candidate split-test number decrease from O(n) to O(logn).4)Improve the soft discretization method to increase classification accuracy in data stream mining.