Class dependent feature scaling method using naive Bayes classifier for text datamining

Authors:
Eunseog Youn;Myong K. Jeong
Affiliations:
Department of Computer Science, Texas Tech University, Lubbock, TX 79409, USA;Department of Industrial and Systems Engineering and RUTCOR (Rutgers Center for Operations Research), Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
Venue:
Pattern Recognition Letters
Year:
2009

Citing 20
Cited 8

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Elements of information theory

Elements of information theory
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Data mining: concepts and techniques

Data mining: concepts and techniques
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Subset Selection in Text-Learning

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Estimating the Generalization Performance of an SVM Efficiently

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

The VLDB Journal — The International Journal on Very Large Data Bases
Data mining for hypertext: a tutorial survey

ACM SIGKDD Explorations Newsletter
Everything old is new again: a fresh look at historical approaches in machine learning

Everything old is new again: a fresh look at historical approaches in machine learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Fast and accurate text classification via multiple linear discriminant projections

The VLDB Journal — The International Journal on Very Large Data Bases
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)

Support vector-based feature selection using Fisher's linear discriminant and Support Vector Machine

Expert Systems with Applications: An International Journal
A new feature selection algorithm based on binomial hypothesis testing for spam filtering

Knowledge-Based Systems
Feature selection for support vector machines with RBF kernel

Artificial Intelligence Review
Remote sensing image classification based on neural network ensemble algorithm

Neurocomputing
A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization

Information Processing and Management: an International Journal
Estimating NBC-based recommendations on arbitrarily partitioned data with privacy

Knowledge-Based Systems
Probabilistic fault detector for Wireless Sensor Network

Expert Systems with Applications: An International Journal
Evolutionary refinement approaches for band selection of hyperspectral images with applications to automatic monitoring of animal feed quality

Intelligent Data Analysis - Business Analytics and Intelligent Optimization

Quantified Score

Hi-index	0.10

Visualization

Abstract

The problem of feature selection is to find a subset of features for optimal classification. A critical part of feature selection is to rank features according to their importance for classification. The naive Bayes classifier has been extensively used in text categorization. We have developed a new feature scaling method, called class-dependent-feature-weighting (CDFW) using naive Bayes (NB) classifier. A new feature scaling method, CDFW-NB-RFE, combines CDFW and recursive feature elimination (RFE). Our experimental results showed that CDFW-NB-RFE outperformed other popular feature ranking schemes used on text datasets.