Large margin DragPushing strategy for centroid text categorization

Authors:
Songbo Tan
Affiliations:
Software Department, Institute of Computing Technology, Chinese Academy of Sciences, P.O. Box 2704, Beijing, 100080, PR China and Graduate School of the Chinese Academy of Sciences, PR China
Venue:
Expert Systems with Applications: An International Journal
Year:
2007

Citing 11
Cited 4

The nature of statistical learning theory

The nature of statistical learning theory
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Boosting to correct inductive bias in text classification

Proceedings of the eleventh international conference on Information and knowledge management
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A refinement approach to handling model misfit in text categorization

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A novel refinement approach for text categorization

Proceedings of the 14th ACM international conference on Information and knowledge management

Rough set based hybrid algorithm for text classification

Expert Systems with Applications: An International Journal
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
An approach to web-based Personal Health Records filtering using fuzzy prototypes and data quality criteria

Information Processing and Management: an International Journal
A high performance centroid-based classification approach for language identification

Pattern Recognition Letters

Quantified Score

Hi-index	12.05

Visualization

Abstract

Among all conventional methods for text categorization, centroid classifier is a simple and efficient method. However it often suffers from inductive bias (or model misfit) incurred by its assumption. DragPushing is a very simple and yet efficient method to address this so-called inductive bias problem. However, DragPushing employs only one criterion, i.e., training-set error, as its objective function that cannot guarantee the generalization capability. In this paper, we propose a generalized DragPushing strategy for centroid classifier, which we called as ''Large Margin DragPushing'' (LMDP). The experiments conducted on three benchmark evaluation collections show that LMDP achieved about one percent improvement over the performance of DragPushing and delivered top performance nearly as well as state-of-the-art SVM without incurring significant computational costs.