An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
The nature of statistical learning theory
The nature of statistical learning theory
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Matrix computations (3rd ed.)
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Optimization by Vector Space Methods
Optimization by Vector Space Methods
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Incorporating prior knowledge with weighted margin support vector machines
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Robustness of adaptive filtering methods in a cross-benchmark evaluation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchy-Regularized Latent Semantic Indexing
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Active learning via transductive experimental design
ICML '06 Proceedings of the 23rd international conference on Machine learning
ACM Transactions on Information Systems (TOIS)
Constructing informative prior distributions from domain knowledge in text classification
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The sentimental factor: improving review classification via human-provided information
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Utility-based information distillation over temporally sequenced documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection methods for text classification
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Generalizing from relevance feedback using named entity wildcards
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
trNon-greedy active learning for text categorization using convex ansductive experimental design
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Complex adaptive filtering user profile using graphical models
Information Processing and Management: an International Journal
SED: supervised experimental design and its application to text classification
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Discovering links between lexical and surface features in questions and answers
WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
PERC: a personal email classifier
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
Real-world applications often require the classification of documents under situations of small number of features, mis-labeled documents and rare positive examples. This paper investigates the robustness of three regularized linear classification methods (SVM, ridge regression and logistic regression) under above situations. We compare these methods in terms of their loss functions and score distributions, and establish the connection between their optimization problems and generalization error bounds. Several sets of controlled experiments on the Reuters-21578 corpus are conducted to investigate the robustness of these methods. Our results show that ridge regression seems to be the most promising candidate for rare class problems.