A novel refinement approach for text categorization

Authors:
Songbo Tan;Xueqi Cheng;Moustafa M. Ghanem;Bin Wang;Hongbo Xu
Affiliations:
ICT, Beijing, CHINA & Chinese Academy of Sciences, CHINA;ICT, Beijing, CHINA;Imperial College London, London, UK;ICT, Beijing, CHINA;ICT, Beijing, CHINA
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 7
Cited 28

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
A study of thresholding strategies for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A refinement approach to handling model misfit in text categorization

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Large margin DragPushing strategy for centroid text categorization

Expert Systems with Applications: An International Journal
Combining error-correcting output codes and model-refinement for text categorization

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An Effective Approach to Enhance Centroid Classifier for Text Categorization

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
An Indexing Matrix Based Retrieval Model

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
Using error-correcting output codes with model-refinement to boost centroid text classifier

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Enhancing the Performance of Centroid Classifier by ECOC and Model Refinement

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Rank Aggregation Based Text Feature Selection

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
SentiRank: Cross-Domain Graph Ranking for Sentiment Classification

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Graph ranking for sentiment transfer

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A Clustering Framework Based on Adaptive Space Mapping and Rescaling

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Improved classification based on predictive association rules

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Feature selection on Chinese text classification using character n-grams

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
A class core extraction method for text categorization

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Hierarchical text categorization based on multiple feature selection and fusion of multiple classifiers approaches

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Two-level hierarchical combination method for text classification

Expert Systems with Applications: An International Journal
Fast text categorization using concise semantic analysis

Pattern Recognition Letters
MIEA: a mutual iterative enhancement approach for cross-domain sentiment classification

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Weighted SCL model for adaptation of sentiment classification

Expert Systems with Applications: An International Journal
Adapting centroid classifier for document categorization

Expert Systems with Applications: An International Journal
Stock price movement prediction using representative prototypes of financial reports

ACM Transactions on Management Information Systems (TMIS)
Class-driven correlation learning for chinese document categorization using discriminative features

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
A novel fast non-negative matrix factorization algorithm and its application in text clustering

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
A text classification algorithm based on rocchio and hierarchical clustering

ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing
Using key sentence to improve sentiment classification

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Research on text categorization based on a weakly-supervised transfer learning method

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Free-gram phrase identification for modeling Chinese text

Information Processing Letters
Theme word subspace method for text document categorization

DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we present a novel strategy, DragPushing, for improving the performance of text classifiers. The strategy is generic and takes advantage of training errors to successively refine the classification model of a base classifier. We describe how it is applied to generate two new classification algorithms; a Refined Centroid Classifier and a Refined Naïve Bayes Classifier. We present an extensive experimental evaluation of both algorithms on three English collections and one Chinese corpus. The results indicate that in each case, the refined classifiers achieve significant performance improvement over the base classifiers used. Furthermore, the performance of the Refined Centroid Classifier implemented is comparable, if not better, to that of state-of-the-art support vector machine (SVM)-based classifier, but offers a much lower computational cost.