C4.5: programs for machine learning
C4.5: programs for machine learning
Technical note: some properties of splitting criteria
Machine Learning
General and Efficient Multisplitting of Numerical Attributes
Machine Learning
Data mining: concepts and techniques
Data mining: concepts and techniques
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Hi-index | 0.00 |
Classification and prediction algorithms for machine learning typically require all training data to be resident in memory during decision tree construction. Typically, a flat file is created from database or data warehouse data and loaded into memory for processing. This severely limits the scalability of these algorithms to practical data mining tasks. Some attempts have been made by researchers to implement disk-based algorithms which can handle much larger training sets. Both approaches suffer from three serious limitations. The first limitation is that a significant amount of the original data must be duplicated on disk. The second limitation is that these algorithms are unable to utilize the computational capabilities of the data warehouse or database system. The computation of sums, counts and averages are some of the operations that database / data warehouse systems can do very efficiently. The third limitation is that these algorithms produce very inflexible decision trees which cannot be manipulated by the analyst. This is because they fail to take advantage of those features of data cube technology that enables analysts to view data at different levels of abstraction. This paper proposes a data mining approach that removes the need to copy data from a data warehouse or database. The approach also facilitates On-line Analytical Mining (OLAM) as it integrates database and data warehouse queries with decision tree construction.