STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
Inferring decision trees using the minimum description length principle
Information and Computation
C4.5: programs for machine learning
C4.5: programs for machine learning
SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Machine Learning
Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Mining optimized association rules for numeric attributes
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SONAR: system for optimized numeric association rules
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Polynomial-time solutions to image segmentation
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Machine Learning
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Machine Learning
Database Mining: A Performance Perspective
IEEE Transactions on Knowledge and Data Engineering
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
An Interval Classifier for Database Mining Applications
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Classification and regression: money *can* grow on trees
KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms for the maximum subarray problem based on matrix multiplication
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Lower bounds for intersection searching and fractional cascading in higher dimension
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Data Mining with optimized two-dimensional association rules
ACM Transactions on Database Systems (TODS)
Algorithms for Finding Attribute Value Group for Binary Segmentation of Categorical Databases
IEEE Transactions on Knowledge and Data Engineering
DS '99 Proceedings of the Second International Conference on Discovery Science
Weighted Majority Decision among Several Region Rules for Scientific Discovery
DS '99 Proceedings of the Second International Conference on Discovery Science
Lower bounds for intersection searching and fractional cascading in higher dimension
Journal of Computer and System Sciences - STOC 2001
Representing a functional curve by curves with fewer peaks
SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
Hi-index | 0.00 |
We propose an extension of an entropy-based heuristicfor constructing a decision tree from a large database with manynumeric attributes. When it comes to handling numeric attributes,conventional methods are inefficient if any numeric attributesare strongly correlated. Our approach offers one solution tothis problem. For each pair of numeric attributes with strongcorrelation, we compute a two-dimensional association rule withrespect to these attributes and the objective attribute of thedecision tree. In particular, we consider a family {\calR} of grid-regions in the plane associated with the pairof attributes. For R \in {\cal R}, the data canbe split into two classes: data inside R and dataoutside R. We compute the region R_{opt}\in {\cal R} that minimizes the entropy of the splitting,and add the splitting associated with R_{opt} (foreach pair of strongly correlated attributes) to the set of candidatetests in an entropy-based heuristic. We give efficient algorithmsfor cases in which {\cal R} is (1) x-monotone connectedregions, (2) based-monotone regions, (3) rectangles, and (4)rectilinear convex regions. The algorithm has been implementedas a subsystem of SONAR (System for Optimized Numeric AssociationRules) developed by the authors. We have confirmed that we cancompute the optimal region efficiently. And diverse experimentsshow that our approach can create compact trees whose accuracyis comparable with or better than that of conventional trees.More importantly, we can grasp non-linear correlation among numericattributes which could not be found without our region splitting.