C4.5: programs for machine learning
C4.5: programs for machine learning
The impact of changing populations on classifier performance
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology
Knowledge Discovery and Data Mining: The Info-Fuzzy Network (Ifn) Methodology
Incremental Induction of Decision Trees
Machine Learning
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Incremental Fuzzy Decision Trees
KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A framework for diagnosing changes in evolving data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
STREAM: the stanford stream data manager (demonstration description)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient decision tree construction on streaming data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM SIGMOD Record
An Efficient Classification System Based on Binary Search Trees for Data Streams Mining
ICONS '07 Proceedings of the Second International Conference on Systems
Online classification of nonstationary data streams
Intelligent Data Analysis
Detecting change in data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Fuzzy decision trees: issues and methods
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Building fast decision trees from large training sets
Intelligent Data Analysis
Hi-index | 0.00 |
Decision tree construction is a well-studied problem in data mining. Recently, there has been much interest in mining data streams. Domingos and Hulten have presented a one-pass algorithm for decision tree constructions. Their system using Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. Peng et al. present soft discretization method to solve continuous attributes in data mining. In this paper, we revisit these problems and implemented a system sVFDT for data stream mining. We make the following contributions: 1) we present a binary search trees (BST) approach for efficiently handling continuous attributes. Its processing time for values inserting is O(nlogn), while VFDT's processing time is O(n2). 2) We improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it decreases from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, sVFDT's candidate split-test number decrease from O(n) to O(logn).4)Improve the soft discretization method to increase classification accuracy in data stream mining.