Discretization: An Enabling Technique

Authors:
Huan Liu;Farhad Hussain;Chew Lim Tan;Manoranjan Dash
Affiliations:
School of Computing, National University of Singapore, Singapore. hliu@asu.edu;School of Computing, National University of Singapore, Singapore. farhad@comp.nus.edu.sg;School of Computing, National University of Singapore, Singapore. tancl@comp.nus.edu.sg;School of Computing, National University of Singapore, Singapore. manoranj@comp.nus.edu.sg
Venue:
Data Mining and Knowledge Discovery
Year:
2002

Citing 18
Cited 139

Decision trees and multi-valued attributes

Machine intelligence 11
A Distance-Based Attribute Selection Measure for Decision Tree Induction

Machine Learning
Optimal Partitioning for Classification and Regression Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
On changing continuous attributes into ordered discrete attributes

EWSL-91 Proceedings of the European working session on learning on Machine learning
On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Efficient agnostic PAC-learning with simple hypothesis

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Decision tree pruning: biased or optimal?

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
The sciences of the artificial (3rd ed.)

The sciences of the artificial (3rd ed.)
Feature Selection via Discretization

IEEE Transactions on Knowledge and Data Engineering
Incremental Induction of Decision Trees

Machine Learning
Induction of Decision Trees

Machine Learning
Class-Driven Statistical Discretization of Continuous Attributes (Extended Abstract)

ECML '95 Proceedings of the 8th European Conference on Machine Learning
A New MDL Measure for Robust Rule Induction (Extended Abstract)

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Concurrent Discretization of Multiple Attributes

PRICAI '98 Proceedings of the 5th Pacific Rim International Conference on Artificial Intelligence: Topics in Artificial Intelligence
Chi2: Feature Selection and Discretization of Numeric Attributes

TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research

Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficiently handling feature redundancy in high-dimensional data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Consistency-based search in feature selection

Artificial Intelligence
Redundancy based feature selection for microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Building multi-way decision trees with numerical attributes

Information Sciences: an International Journal
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Building knowledge discovery-driven models for decision support in project management

Decision Support Systems
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Genetic fuzzy discretization with adaptive intervals for classification problems

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
A Discretization Algorithm Based on a Heterogeneity Criterion

IEEE Transactions on Knowledge and Data Engineering
Optimizing time series discretization for knowledge discovery

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On handling conflicts between rules with numerical features

Proceedings of the 2006 ACM symposium on Applied computing
Coordination number prediction using learning classifier systems: performance and interpretability

Proceedings of the 8th annual conference on Genetic and evolutionary computation
MODL: A Bayes optimal discretization method for continuous attributes

Machine Learning
Relaxing instance boundaries for the search of splitting points of numerical attributes in classification trees

Information Sciences: an International Journal
Ent-Boost: Boosting using entropy measures for robust object detection

Pattern Recognition Letters
Optimal bin number for equal frequency discretizations in supervized learning

Intelligent Data Analysis
Using metarules to organize and group discovered association rules

Data Mining and Knowledge Discovery
Extracting classification rule of software diagnosis using modified MEPA

Expert Systems with Applications: An International Journal
An association rule mining method for estimating the impact of project management policies on software quality, development time and effort

Expert Systems with Applications: An International Journal
Movie forecast Guru: A Web-based DSS for Hollywood managers

Decision Support Systems
Strategies for Identifying Statistically Significant Dense Regions in Microarray Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A discretization algorithm based on Class-Attribute Contingency Coefficient

Information Sciences: an International Journal
A weighted rough set based method developed for class imbalance learning

Information Sciences: an International Journal
k-ANMI: A mutual information based clustering algorithm for categorical data

Information Fusion
Spatio-temporal discretization for sequential pattern mining

Proceedings of the 2nd international conference on Ubiquitous information management and communication
Mixed feature selection based on granulation and approximation

Knowledge-Based Systems
Consistency measures for feature selection

Journal of Intelligent Information Systems
Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection

Expert Systems with Applications: An International Journal
Mining Numerical Data--A Rough Set Approach

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Improved Algorithms for Univariate Discretization of Continuous Features

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Hopfield Networks in Relevance and Redundancy Feature Selection Applied to Classification of Biomedical High-Resolution Micro-CT Images

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Estimation of Market Share by Using Discretization Technology: An Application in China Mobile

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part II
IDFQ: An Interface for Database Flexible Querying

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
cAnt-Miner: An Ant Colony Classification Algorithm to Cope with Continuous Attributes

ANTS '08 Proceedings of the 6th international conference on Ant Colony Optimization and Swarm Intelligence
Data pre-processing: a new algorithm for feature selection and data discretization

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
A comparative study on rough set based class imbalance learning

Knowledge-Based Systems
A bottom-up approach to discover transition rules of cellular automata using ant intelligence

International Journal of Geographical Information Science
A FCM-based deterministic forecasting model for fuzzy time series

Computers & Mathematics with Applications
Mining decision rules on data streams in the presence of concept drifts

Expert Systems with Applications: An International Journal
Automatic parameter tuning with a Bayesian case-based reasoning system. A case of study

Expert Systems with Applications: An International Journal
Empirical Evaluation of Ensemble Techniques for a Pittsburgh Learning Classifier System

Learning Classifier Systems
An architecture for making recommendations to courseware authors using association rule mining and collaborative filtering

User Modeling and User-Adapted Interaction
Selection and optimization of cut-points for numeric attribute values

Computers & Mathematics with Applications
Feature Selection in Genetic Fuzzy Discretization for the Pattern Classification Problems

IEICE - Transactions on Information and Systems
Evolutionary Optimization Guided by Entropy-Based Discretization

EvoWorkshops '09 Proceedings of the EvoWorkshops 2009 on Applications of Evolutionary Computing: EvoCOMNET, EvoENVIRONMENT, EvoFIN, EvoGAMES, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, EvoNUM, EvoSTOC, EvoTRANSLOG
Encoding Ordinal Features into Binary Features for Text Classification

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
An empirical determination of samples for decision trees

AIKED'09 Proceedings of the 8th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
An effective sampling method for decision trees considering comprehensibility and accuracy

WSEAS Transactions on Computers
A Discretization Process in Accordance with a Qualitative Ordered Output

Proceedings of the 2005 conference on Artificial Intelligence Research and Development
An experimental decision of samples for RBF neural networks

MUSP'09 Proceedings of the 9th WSEAS international conference on Multimedia systems & signal processing
Evolutionary multi-feature construction for data reduction: A case study

Applied Soft Computing
Application of ant colony, genetic algorithm and data mining-based techniques for scheduling

Robotics and Computer-Integrated Manufacturing
Using Resampling Techniques for Better Quality Discretization

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
OFFD: Optimal Flexible Frequency Discretization for Naïve Bayes Classification

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A Multiple Scanning Strategy for Entropy Based Discretization

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
The relationship of sample size and accuracy in radial basis function networks

WSEAS Transactions on Computers
On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining

Applied Soft Computing
Sampling scheme for better RBF network

Proceedings of the 2009 International Conference on Hybrid Information Technology
Adapted variable precision rough set approach for EEG analysis

Artificial Intelligence in Medicine
Logic-based fuzzy networks: A study in system modeling with triangular norms and uninorms

Fuzzy Sets and Systems
Feature selection for aiding glass forensic evidence analysis

Intelligent Data Analysis
A combination of discretization and filter methods for improving classification performance in KDD Cup 99 dataset

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Predicting box-office success of motion pictures with neural networks

Expert Systems with Applications: An International Journal
Association rule mining-based dissolved gas analysis for fault diagnosis of power transformers

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Selecting discrete and continuous features based on neighborhood decision error minimization

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A hybrid model based on rough sets theory and genetic algorithms for stock price forecasting

Information Sciences: an International Journal
A Parameter-Free Classification Method for Large Scale Learning

The Journal of Machine Learning Research
Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

The Journal of Machine Learning Research
Khiops: a discretization method of continuous attributes with guaranteed resistance to noise

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Deterministic vector long-term forecasting for fuzzy time series

Fuzzy Sets and Systems
extraRelief: improving relief by efficient selection of instances

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Obtaining low-arity discretizations from online data streams

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
On improving discretization quality by a bagging technique

ICNC'09 Proceedings of the 5th international conference on Natural computation
Analysis of the Effectiveness of the Genetic Algorithms based on Extraction of Association Rules

Fundamenta Informaticae - Intelligent Data Analysis in Granular Computing
Interpretation of extended Pawlak flow graphs using granular computing

Transactions on rough sets VIII
An intelligent decision support algorithm for diagnosis of colorectal cancer through serum tumor markers

Computer Methods and Programs in Biomedicine
Pattern discovery for large mixed-mode database

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An evaluation of automated structure learning with Bayesian networks: An application to estuarine chlorophyll dynamics

Environmental Modelling & Software
A discretization algorithm for uncertain data

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Review:

The Knowledge Engineering Review
Internet traffic classification demystified: on the sources of the discriminative power

Proceedings of the 6th International COnference
A supervised and multivariate discretization algorithm for rough sets

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset

Expert Systems with Applications: An International Journal
A vector forecasting model for fuzzy time series

Applied Soft Computing
An intelligent memory model for short-term prediction: an application to global solar radiation data

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
Empirical study of feature selection methods based on individual feature evaluation for classification problems

Expert Systems with Applications: An International Journal
Quantifying the trustworthiness of social media content

Distributed and Parallel Databases
Measuring relevance between discrete and continuous features based on neighborhood mutual information

Expert Systems with Applications: An International Journal
Core-generating approximate minimum entropy discretization for rough set feature selection in pattern classification

International Journal of Approximate Reasoning
Dynamic discreduction using Rough Sets

Applied Soft Computing
Core-generating discretization for rough set feature selection

Transactions on rough sets XIII
A global unsupervised data discretization algorithm based on collective correlation coefficient

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Semi-supervised learning for mixed-type data via formal concept analysis

ICCS'11 Proceedings of the 19th international conference on Conceptual structures for discovering knowledge
An effective discretization based on Class-Attribute Coherence Maximization

Pattern Recognition Letters
An enhanced classification method comprising a genetic algorithm, rough set theory and a modified PBMF-index function

Applied Soft Computing
Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection

Information Sciences: an International Journal
MYNDA: an intelligent data mining application generator

IVIC'11 Proceedings of the Second international conference on Visual informatics: sustaining research and innovations - Volume Part II
Binding statistical and machine learning models for short-term forecasting of global solar radiation

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Learning feature-projection based classifiers

Expert Systems with Applications: An International Journal
Using reliable short rules to avoid unnecessary tests in decision trees

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Optimal bayesian 2d-discretization for variable ranking in regression

DS'06 Proceedings of the 9th international conference on Discovery Science
Software diagnosis using fuzzified attribute base on modified MEPA

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
An ICA-Based multivariate discretization algorithm

KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
Mining numerical data – a rough set approach

Transactions on Rough Sets XI
A new method for discretization of continuous attributes based on VPRS

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
A comprehensively sized decision tree generation method for interactive data mining of very large databases

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Multivariate discretization for associative classification in a sparse data application domain

HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part I
Approximate boolean reasoning: foundations and applications in data mining

Transactions on Rough Sets V
Data reduction for instance-based learning using entropy-based partitioning

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part III
Feature relationships hypergraph for multimodal recognition

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Using rules discovery for the continuous improvement of e-learning courses

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Feature selection for MAUC-oriented classification systems

Neurocomputing
An unsupervised approach to feature discretization and selection

Pattern Recognition
Effect of data discretization on the classification accuracy in a high-dimensional framework

International Journal of Intelligent Systems
Improving the ranking quality of medical image retrieval using a genetic feature selection method

Decision Support Systems
A formal model for mining fuzzy rules using the RL representation theory

Information Sciences: an International Journal
Questionnaires-based skin attribute prediction using Elman neural network

Neurocomputing
CD: a coupled discretization algorithm

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
A network intrusion detection system based on a Hidden Naïve Bayes multiclass classifier

Expert Systems with Applications: An International Journal
Predictive combinations of monitor alarms preceding in-hospital code blue events

Journal of Biomedical Informatics
Review: Supervised classification and mathematical optimization

Computers and Operations Research
Neighborhood effective information ratio for hybrid feature subset evaluation and selection

Neurocomputing
Two way focused classification

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Learning figures with the Hausdorff metric by fractals--towards computable binary classification

Machine Learning
The user side of sustainability: Modeling behavior and energy usage in the home

Pervasive and Mobile Computing
An Efficient Method for Discretizing Continuous Attributes

International Journal of Data Warehousing and Mining
Knowledge Bases Over Algebraic Models: Some Notes About Informational Equivalence

International Journal of Knowledge Management
UniDis: a universal discretization technique

Journal of Intelligent Information Systems
Discovering human immunodeficiency virus mutational pathways using temporal Bayesian networks

Artificial Intelligence in Medicine
Towards learning normality for anomaly detection in industrial control networks

AIMS'13 Proceedings of the 7th IFIP WG 6.6 international conference on Autonomous Infrastructure, Management, and Security: emerging management mechanisms for the future internet - Volume 7943
QAR-CIP-NSGA-II: A new multi-objective evolutionary algorithm to mine quantitative association rules

Information Sciences: an International Journal
A method for extracting rules from spatial data based on rough fuzzy sets

Knowledge-Based Systems
Ant Colony Algorithms for Data Learning

International Journal of Applied Evolutionary Computation
Inferring ECA-based rules for ambient intelligence using evolutionary feature extraction

Journal of Ambient Intelligence and Smart Environments
Automated error detection using association rules

Intelligent Data Analysis
Compact classification of optimized Boolean reasoning with Particle Swarm Optimization

Intelligent Data Analysis
Semi-supervised learning on closed set lattices

Intelligent Data Analysis

Quantified Score

Hi-index	0.02

Visualization

Abstract

Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization.