Robustness of regularized linear classification methods in text categorization

Authors:
Jian Zhang;Yiming Yang
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 10
Cited 17

An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
The nature of statistical learning theory

The nature of statistical learning theory
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Optimization by Vector Space Methods

Optimization by Vector Space Methods
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Incorporating prior knowledge with weighted margin support vector machines

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Robustness of adaptive filtering methods in a cross-benchmark evaluation

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchy-Regularized Latent Semantic Indexing

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Active learning via transductive experimental design

ICML '06 Proceedings of the 23rd international conference on Machine learning
A maximal figure-of-merit (MFoM)-learning approach to robust classifier design for text categorization

ACM Transactions on Information Systems (TOIS)
Constructing informative prior distributions from domain knowledge in text classification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The sentimental factor: improving review classification via human-provided information

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Utility-based information distillation over temporally sequenced documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection methods for text classification

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Generalizing from relevance feedback using named entity wildcards

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
trNon-greedy active learning for text categorization using convex ansductive experimental design

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Complex adaptive filtering user profile using graphical models

Information Processing and Management: an International Journal
A regularization framework for multiclass classification: A deterministic annealing approach

Pattern Recognition
SED: supervised experimental design and its application to text classification

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Discovering links between lexical and surface features in questions and answers

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
PERC: a personal email classifier

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-world applications often require the classification of documents under situations of small number of features, mis-labeled documents and rare positive examples. This paper investigates the robustness of three regularized linear classification methods (SVM, ridge regression and logistic regression) under above situations. We compare these methods in terms of their loss functions and score distributions, and establish the connection between their optimization problems and generalization error bounds. Several sets of controlled experiments on the Reuters-21578 corpus are conducted to investigate the robustness of these methods. Our results show that ridge regression seems to be the most promising candidate for rare class problems.