A novel field learning algorithm for dual imbalance text classification

Authors:
Ling Zhuang;Honghua Dai;Xiaoshu Hang
Affiliations:
School of Information Technology, Deakin University, VIC, Australia;School of Information Technology, Deakin University, VIC, Australia;School of Information Technology, Deakin University, VIC, Australia
Venue:
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Year:
2005

Citing 5
Cited 3

Making large-scale support vector machine learning practical

Advances in kernel methods
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Inexact Field Learning: An Approach to Induce High Quality Rules from Low Quality Data

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A pitfall and solution in multi-class feature selection for text classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning

A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets

Fuzzy Sets and Systems
Smoothing LDA model for text categorization

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
ROLEX-SP: Rules of lexical syntactic patterns for free text categorization

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fish-net algorithm is a novel field learning algorithm which derives classification rules by looking at the range of values of each attribute instead of the individual point values. In this paper, we present a Feature Selection Fish-net learning algorithm to solve the Dual Imbalance problem on text classification. Dual imbalance includes the instance imbalance and feature imbalance. The instance imbalance is caused by the unevenly distributed classes and feature imbalance is due to the different document length. The proposed approach consists of two phases: (1) select a feature subset which consists of the features that are more supportive to difficult minority class; (2) construct classification rules based on the original Fish-net algorithm. Our experimental results on Reuters21578 show that the proposed approach achieves better balanced accuracy rate on both majority and minority class than Naive Bayes MultiNomial and SVM.