Implementation and Evaluation of Decision Trees with Rangeand Region Splitting

Authors:
Yasuhiko Morimoto;Takeshi Fukuda;Shinichi Morishita;Takeshi Tokuyama
Affiliations:
-;-;-;IBM Tokyo Research Laboratory, 1623-14, Shimo-tsuruma, Yamato City, Kanagawa Pref, 242, JAPAN
Venue:
Constraints
Year:
1998

Citing 16
Cited 10

Probing convex polytopes

STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
Inferring decision trees using the minimum description length principle

Information and Computation
C4.5: programs for machine learning

C4.5: programs for machine learning
Computing the discrepancy

SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Multivariate Decision Trees

Machine Learning
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Mining optimized association rules for numeric attributes

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SONAR: system for optimized numeric association rules

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Polynomial-time solutions to image segmentation

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Machine Learning

Machine Learning
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Induction of Decision Trees

Machine Learning
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
An Interval Classifier for Database Mining Applications

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases

Classification and regression: money *can* grow on trees

KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms for the maximum subarray problem based on matrix multiplication

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Lower bounds for intersection searching and fractional cascading in higher dimension

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Data Mining with optimized two-dimensional association rules

ACM Transactions on Database Systems (TODS)
Efficient Construction of Regression Trees with Range and Region Splitting

Machine Learning
Algorithms for Finding Attribute Value Group for Binary Segmentation of Categorical Databases

IEEE Transactions on Knowledge and Data Engineering
Approximation of Optimal Two-Dimensional Association Rules for Categorical Attributes Using Semidefinite Programming

DS '99 Proceedings of the Second International Conference on Discovery Science
Weighted Majority Decision among Several Region Rules for Scientific Discovery

DS '99 Proceedings of the Second International Conference on Discovery Science
Lower bounds for intersection searching and fractional cascading in higher dimension

Journal of Computer and System Sciences - STOC 2001
Representing a functional curve by curves with fewer peaks

SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an extension of an entropy-based heuristicfor constructing a decision tree from a large database with manynumeric attributes. When it comes to handling numeric attributes,conventional methods are inefficient if any numeric attributesare strongly correlated. Our approach offers one solution tothis problem. For each pair of numeric attributes with strongcorrelation, we compute a two-dimensional association rule withrespect to these attributes and the objective attribute of thedecision tree. In particular, we consider a family {\calR} of grid-regions in the plane associated with the pairof attributes. For R \in {\cal R}, the data canbe split into two classes: data inside R and dataoutside R. We compute the region R_{opt}\in {\cal R} that minimizes the entropy of the splitting,and add the splitting associated with R_{opt} (foreach pair of strongly correlated attributes) to the set of candidatetests in an entropy-based heuristic. We give efficient algorithmsfor cases in which {\cal R} is (1) x-monotone connectedregions, (2) based-monotone regions, (3) rectangles, and (4)rectilinear convex regions. The algorithm has been implementedas a subsystem of SONAR (System for Optimized Numeric AssociationRules) developed by the authors. We have confirmed that we cancompute the optimal region efficiently. And diverse experimentsshow that our approach can create compact trees whose accuracyis comparable with or better than that of conventional trees.More importantly, we can grasp non-linear correlation among numericattributes which could not be found without our region splitting.