Comparisons of classification methods in the original and pattern spaces

Authors:
Jeong Han;Norman Kim;Myong K. Jeong;Bong-Jin Yum
Affiliations:
Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea;RUTCOR (Rutgers Center for Operations Research), Rutgers, The State University of New Jersey, Piscataway, NJ, USA;RUTCOR (Rutgers Center for Operations Research), Rutgers, The State University of New Jersey, Piscataway, NJ, USA;Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 8
Cited 0

Cause-effect relationships and partially defined Boolean functions

Annals of Operations Research
Logical analysis of numerical data

Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
Convexity and logical analysis of data

Theoretical Computer Science
An Implementation of Logical Analysis of Data

IEEE Transactions on Knowledge and Data Engineering
Combinatorial Approach for Data Binarization

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Maximum patterns in datasets

Discrete Applied Mathematics
Spanned patterns for the logical analysis of data

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

Quantified Score

Hi-index	12.05

Visualization

Abstract

The logical analysis of data (LAD) is one of the most promising data mining and machine learning techniques developed to date for extracting knowledge from data. The LAD is based on the concepts of combinatorics, optimization, and Boolean functions. The key feature of the LAD is the capability of detecting hidden patterns in the data. Since patterns are basically combinations of certain attributes, they can be used to build a decision boundary for classification in the LAD by providing important information to distinguish observations in one class from those in the other class. The use of patterns may result in a more stable performance in terms of being able to classify both positive and negative classes due to their robustness to measurement errors. The patterns are also interpretable and can serve as an essential tool for understanding the problem. These desirable properties of the patterns generated from the LAD motivate the use of the LAD patterns as input variables to other classification techniques to achieve a more stable and accurate performance. In this paper, the patterns generated from the LAD are used as the input variables to the decision tree and k-nearest neighbor classification methods. The applicability and usefulness of the LAD patterns for classification are investigated experimentally. The classification accuracy and sensitivity of the classification results for different classifiers in the original and pattern spaces are compared using several public data. The experimental results show that classifications in the pattern space can yield better and stable performance than those in the original space in terms of accuracy when the classification accuracy of the LAD is relatively good (i.e., the LAD patterns are of good quality), the ratio of the number of patterns to the total number of attributes is small, or the data set for classification is balanced between two classes.