An Efficient Method for Discretizing Continuous Attributes

Authors:
Kelley M. Engle;Aryya Gangopadhyay
Affiliations:
University of Maryland Baltimore County, USA;University of Maryland Baltimore County, USA
Venue:
International Journal of Data Warehousing and Mining
Year:
2010

Citing 17
Cited 3

BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Principles of data mining

Principles of data mining
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets

Data Mining and Knowledge Discovery
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

Data Mining and Knowledge Discovery
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Mining Very Large Databases

Computer
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
CLARANS: A Method for Clustering Objects for Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
Class-Driven Statistical Discretization of Continuous Attributes (Extended Abstract)

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Efficient Determination of Dynamic Split Points in a Decision Tree

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Generalization and decision tree induction: efficient classification in data mining

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Exploring the Scalability of Character-Based Storytelling

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research

Data Field for Hierarchical Clustering

International Journal of Data Warehousing and Mining
Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining
Spatial Data Mining for Highlighting Hotspots in Personal Navigation Routes

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper the authors present a novel method for finding optimal split points for discretization of continuous attributes. Such a method can be used in many data mining techniques for large databases. The method consists of two major steps. In the first step search space is pruned using a bisecting region method that partitions the search space and returns the point with the highest information gain based on its search. The second step consists of a hill climbing algorithm that starts with the point returned by the first step and greedily searches for an optimal point. The methods were tested using fifteen attributes from two data sets. The results show that the method reduces the number of searches drastically while identifying the optimal or near-optimal split points. On average, there was a 98% reduction in the number of information gain calculations with only 4% reduction in information gain.