General and Efficient Multisplitting of Numerical Attributes

Authors:
Tapio Elomaa;Juho Rousu
Affiliations:
Department of Computer Science, P.O. Box 26, FIN-00014 University of Helsinki, Finland. elomaa@cs.helsinki.fi;VTT Biotechnology and Food Research, Tietotie 2, P.O. Box 1501, FIN-02044 VTT, Finland. Juho.Rousu@vtt.fi
Venue:
Machine Learning
Year:
1999

Citing 23
Cited 42

Decision trees and multi-valued attributes

Machine intelligence 11
Inferring decision trees using the minimum description length principle

Information and Computation
A Distance-Based Attribute Selection Measure for Decision Tree Induction

Machine Learning
On changing continuous attributes into ordered discrete attributes

EWSL-91 Proceedings of the European working session on learning on Machine learning
Induction of one-level decision trees

ML92 Proceedings of the ninth international workshop on Machine learning
A Further Comparison of Splitting Rules for Decision-Tree Induction

Machine Learning
On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Analysis of arithmetic coding for data compression

Information Processing and Management: an International Journal - Special issue on data compression for images and texts
Coding Decision Trees

Machine Learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Efficient agnostic PAC-learning with simple hypothesis

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Technical Note: Bias in Information-Based Measures in Decision Tree Induction

Machine Learning
Arithmetic coding for data compression

Communications of the ACM
Technical note: some properties of splitting criteria

Machine Learning
On the well-behavedness of important attribute evaluation functions

SCAI '97 Proceedings of the sixth Scandinavian conference on Artificial intelligence
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
An Empirical Comparison of Selection Measures for Decision-Tree Induction

Machine Learning
Induction of Decision Trees

Machine Learning
Stochastic complexity in learning

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
On Fast and Simple Algorithms for Finding Maximal Subarrays and Applications in Learning Theory

EuroCOLT '97 Proceedings of the Third European Conference on Computational Learning Theory
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
On biases in estimating multi-valued attributes

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

An integrated approach for scaling up classification and prediction algorithms for data mining

SAICSIT '02 Proceedings of the 2002 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology
Linear-Time Preprocessing in Optimal Numerical Range Partitioning

Journal of Intelligent Information Systems - Special issue: A survey of research questions for intelligent information systems in education
Generalized Radial Basis Function Networks Trained with Instance Based Learning for Data Mining of Symbolic Data

Applied Intelligence
Efficient C4.5

IEEE Transactions on Knowledge and Data Engineering
On the Complexity of Optimal Multisplitting

ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
The Continuous-Function Attribute Class in Decision Tree Induction

DS '98 Proceedings of the First International Conference on Discovery Science
Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates

Data Mining and Knowledge Discovery
Comparison of Heuristic Criteria for Fuzzy Rule Selection in Classification Problems

Fuzzy Optimization and Decision Making
Genetic fuzzy discretization with adaptive intervals for classification problems

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Evaluating the performance of cost-based discretization versus entropy-and error-based discretization

Computers and Operations Research
Multiobjective genetic rule selection as a data mining postprocessing procedure

Proceedings of the 8th annual conference on Genetic and evolutionary computation
MODL: A Bayes optimal discretization method for continuous attributes

Machine Learning
Genetic algorithm in designing fuzzy information retrieval-based classifier by principal component analysis

Computers and Industrial Engineering - Special issue: Computational intelligence and information technology applications to industrial engineering selected papers from the 33 rd ICC&IE
On the Computational Complexity of Optimal Multisplitting

Fundamenta Informaticae - Intelligent Systems
Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning

International Journal of Approximate Reasoning
Fuzzy integral-based perceptron for two-class pattern classification problems

Information Sciences: an International Journal
Symbolic adaptive neuro-fuzzy inference for data mining of heterogenous data

Intelligent Data Analysis
Adaptive fastest path computation on a road network: a traffic mining approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Machine learning: a review of classification and combining techniques

Artificial Intelligence Review
Fuzzy classifier identification using decision tree and multiobjective evolutionary algorithms

International Journal of Approximate Reasoning
Evolutionary multiobjective optimization for the design of fuzzy rule-based ensemble classifiers

International Journal of Hybrid Intelligent Systems - Hybrid Intelligent systems in Ensembles
Genetic rule selection with a multi-classifier coding scheme for ensemble classifier design

International Journal of Hybrid Intelligent Systems - Hybridization of Intelligent Systems
Constructing accurate fuzzy classifiers: A new adaptive method for rule-weight specification

International Journal of Knowledge-based and Intelligent Engineering Systems
Improved MCMC sampling methods for estimating weighted sums in Winnow with application to DNF learning

Machine Learning
Prescreening of Candidate Rules Using Association Rule Mining and Pareto-optimality in Genetic Rule Selection

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
Moving towards efficient decision tree construction

Information Sciences: an International Journal
A neural network-based multi-agent classifier system

Neurocomputing
Feature Selection in Genetic Fuzzy Discretization for the Pattern Classification Problems

IEICE - Transactions on Information and Systems
Genetic algorithm in designing fuzzy information retrieval-based classifier by principal component analysis

Computers and Industrial Engineering
Autonomous classifiers with understandable rule using multi-objective genetic algorithms

Expert Systems with Applications: An International Journal
Pattern classification by multi-layer perceptron using fuzzy integral-based activation function

Applied Soft Computing
Effects of three-objective genetic rule selection on the generalization ability of fuzzy rule-based systems

EMO'03 Proceedings of the 2nd international conference on Evolutionary multi-criterion optimization
Evolutionary multiobjective optimization for generating an ensemble of fuzzy rule-based classifiers

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Obtaining low-arity discretizations from online data streams

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Maintaining optimal multi-way splits for numerical attributes in data streams

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A new node splitting measure for decision tree construction

Pattern Recognition
Feature selection for classifying high-dimensional numerical data

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Learning fuzzy rules for similarity assessment in case-based reasoning

Expert Systems with Applications: An International Journal
Handling continuous-valued attributes in incremental first-order rules learning

AI*IA'05 Proceedings of the 9th conference on Advances in Artificial Intelligence
On the Computational Complexity of Optimal Multisplitting

Fundamenta Informaticae - Intelligent Systems
Fuzzy rule-based similarity model enables learning from small case bases

Applied Soft Computing
Variable precision rough set based decision tree classifier

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Hybrid approaches for approximate reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Often in supervised learning numerical attributes requirespecial treatment and do not fit the learning scheme as well as onecould hope. Nevertheless, they are common in practical tasks and,therefore, need to be taken into account. We characterize thewell-behavedness of an evaluation function, a property thatguarantees the optimal multi-partition of an arbitrary numericaldomain to be defined on boundary points. Well-behavedness reduces thenumber of candidate cut points that need to be examined inmultisplitting numerical attributes. Many commonly used attributeevaluation functions possess this property; we demonstrate that thecumulative functions Information Gain and Training Set Error as wellas the non-cumulative functions Gain Ratio and Normalized DistanceMeasure are all well-behaved. We also devise a method of findingoptimal multisplits efficiently by examining the minimum number ofboundary point combinations that is required to produce partitionswhich are optimal with respect to a cumulative and well-behavedevaluation function. Our empirical experiments validate the utilityof optimal multisplitting: it produces constantly better partitionsthan alternative approaches do and it only requires comparable time.In top-down induction of decision trees the choice of evaluationfunction has a more decisive effect on the result than the choice ofpartitioning strategy; optimizing the value of most common attributeevaluation functions does not raise the accuracy of the produceddecision trees. In our tests the construction time using optimalmultisplitting was, on the average, twice that required by greedymultisplitting, which in its part required on the average twice thetime of binary splitting.