An information granulation based data mining approach for classifying imbalanced data

Authors:
Mu-Chen Chen;Long-Sheng Chen;Chun-Chin Hsu;Wei-Rong Zeng
Affiliations:
Institute of Traffic and Transportation, National Chiao Tung University, 4F, 118, Section 1, Chung-Hsiao W. Road, Taipei 10012, Taiwan;Department of Information Management, Chaoyang University of Technology, 168, Jifong E. Road, Wufong Township, Taichung County 41349, Taiwan;Department of Industrial Engineering and Management, Chaoyang University of Technology, 168, Jifong E. Road, Wufong Township, Taichung County 41349, Taiwan;Information Management Department, Entie Commercial Bank, Taipei, Taiwan
Venue:
Information Sciences: an International Journal
Year:
2008

Citing 29
Cited 17

Neural computing: theory and practice

Neural computing: theory and practice
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
Neural networks: algorithms, applications, and programming techniques

Neural networks: algorithms, applications, and programming techniques
Using linear algebra for intelligent information retrieval

SIAM Review
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Robust Classification for Imprecise Environments

Machine Learning
CSVD: Clustering and Singular Value Decomposition for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Improving Text Classification using Local Latent Semantic Indexing

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Applications of singular-value decomposition (SVD)

Mathematics and Computers in Simulation - Special issue: Applications of computer algebra in science, engineering, simulation and special software
Classification and knowledge discovery in protein databases

Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Toward a generalized theory of uncertainty (GTU): an outline

Information Sciences—Informatics and Computer Science: An International Journal
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
An Unsupervised Learning Approach to Resolving the Data Imbalanced Issue in Supervised Learning Problems in Functional Genomics

HIS '05 Proceedings of the Fifth International Conference on Hybrid Intelligent Systems
A neural network based information granulation approach to shorten the cellular phone test process

Computers in Industry
Topological approaches to covering rough sets

Information Sciences: an International Journal
Granulation of a fuzzy set: Nonspecificity

Information Sciences: an International Journal
Granular computing and dual Galois connection

Information Sciences: an International Journal
The class imbalance problem: A systematic study

Intelligent Data Analysis
A multiview approach for intelligent data analysis based on data operators

Information Sciences: an International Journal
A weighted rough set based method developed for class imbalance learning

Information Sciences: an International Journal
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning from imbalanced data in surveillance of nosocomial infection

Artificial Intelligence in Medicine
Learning classifiers from imbalanced data based on biased minimax probability machine

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Recursive information granulation: aggregation and interpretation issues

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

MDS: a novel method for class imbalance learning

Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Getting insights from the voices of customers: Conversation mining at a contact center

Information Sciences: an International Journal
Handling Class Imbalance Problems via Weighted BP Algorithm

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
GP-COACH: Genetic Programming-based learning of COmpact and ACcurate fuzzy rule-based classification systems for High-dimensional problems

Information Sciences: an International Journal
On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets

Information Sciences: an International Journal
Behavioral assessment of recoverable credit of retailer's customers

Information Sciences: an International Journal
Business intelligence for delinquency risk management via cox regression

PKAW'10 Proceedings of the 11th international conference on Knowledge management and acquisition for smart systems and services
Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics

Expert Systems with Applications: An International Journal
A two-stage evolutionary algorithm based on sensitivity and accuracy for multi-class problems

Information Sciences: an International Journal
A normal distribution-based over-sampling approach to imbalanced data classification

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Identifying the medical practice after total hip arthroplasty using an integrated hybrid approach

Computers in Biology and Medicine
Class distribution estimation based on the Hellinger distance

Information Sciences: an International Journal
Prediction of flavin mono-nucleotide binding sites using modified PSSM profile and ensemble support vector machine

Computers in Biology and Medicine
Fast dimension reduction for document classification based on imprecise spectrum analysis

Information Sciences: an International Journal
Modeling hybrid rough set-based classification procedures to identify hemodialysis adequacy for end-stage renal disease patients

Computers in Biology and Medicine
An empirical study of the classification performance of learners on imbalanced and noisy software quality data

Information Sciences: an International Journal
Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Recently, the class imbalance problem has attracted much attention from researchers in the field of data mining. When learning from imbalanced data in which most examples are labeled as one class and only few belong to another class, traditional data mining approaches do not have a good ability to predict the crucial minority instances. Unfortunately, many real world data sets like health examination, inspection, credit fraud detection, spam identification and text mining all are faced with this situation. In this study, we present a novel model called the ''Information Granulation Based Data Mining Approach'' to tackle this problem. The proposed methodology, which imitates the human ability to process information, acquires knowledge from Information Granules rather then from numerical data. This method also introduces a Latent Semantic Indexing based feature extraction tool by using Singular Value Decomposition, to dramatically reduce the data dimensions. In addition, several data sets from the UCI Machine Learning Repository are employed to demonstrate the effectiveness of our method. Experimental results show that our method can significantly increase the ability of classifying imbalanced data.