Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications

Authors:
Namgil Lee;Jong-Min Kim
Affiliations:
Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Republic of Korea;Statistics Discipline, Division of Science and Mathematics, University of Minnesota-Morris, Morris, MN 56267, USA
Venue:
Computational Statistics & Data Analysis
Year:
2010

Citing 13
Cited 1

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
The nature of statistical learning theory

The nature of statistical learning theory
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text classification using string kernels

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
A survey of kernels for structured data

ACM SIGKDD Explorations Newsletter
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Support vector machines for histogram-based image classification

IEEE Transactions on Neural Networks

Large margin classifiers and Random Forests for integrated biological prediction

International Journal of Bioinformatics Research and Applications

Quantified Score

Hi-index	0.03

Visualization

Abstract

Many pattern classification algorithms such as Support Vector Machines (SVMs), Multi-Layer Perceptrons (MLPs), and K-Nearest Neighbors (KNNs) require data to consist of purely numerical variables. However many real world data consist of both categorical and numerical variables. In this paper we suggest an effective method of converting the mixed data of categorical and numerical variables into data of purely numerical variables for binary classifications. Since the suggested method is based on the theory of learning Bayesian Network Classifiers (BNCs), it is computationally efficient and robust to noises and data losses. Also the suggested method is expected to extract sufficient information for estimating a minimum-error-rate (MER) classifier. Simulations on artificial data sets and real world data sets are conducted to demonstrate the competitiveness of the suggested method when the number of values in each categorical variable is large and BNCs accurately model the data.