BOAT—optimistic decision tree construction
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Principles of data mining
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets
Data Mining and Knowledge Discovery
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
Data Mining and Knowledge Discovery
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
Computer
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
CLARANS: A Method for Clustering Objects for Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
Class-Driven Statistical Discretization of Continuous Attributes (Extended Abstract)
ECML '95 Proceedings of the 8th European Conference on Machine Learning
Efficient Determination of Dynamic Split Points in a Decision Tree
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Generalization and decision tree induction: efficient classification in data mining
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Exploring the Scalability of Character-Based Storytelling
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
Data Field for Hierarchical Clustering
International Journal of Data Warehousing and Mining
Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces
International Journal of Data Warehousing and Mining
Spatial Data Mining for Highlighting Hotspots in Personal Navigation Routes
International Journal of Data Warehousing and Mining
Hi-index | 0.00 |
In this paper the authors present a novel method for finding optimal split points for discretization of continuous attributes. Such a method can be used in many data mining techniques for large databases. The method consists of two major steps. In the first step search space is pruned using a bisecting region method that partitions the search space and returns the point with the highest information gain based on its search. The second step consists of a hill climbing algorithm that starts with the point returned by the first step and greedily searches for an optimal point. The methods were tested using fifteen attributes from two data sets. The results show that the method reduces the number of searches drastically while identifying the optimal or near-optimal split points. On average, there was a 98% reduction in the number of information gain calculations with only 4% reduction in information gain.