Adapting centroid classifier for document categorization

Authors:
Songbo Tan;Yuefen Wang;Gaowei Wu
Affiliations:
Key Laboratory of Network, Institute of Computing Technology, Chinese Academy of Sciences, China;Information Center, Chinese Academy of Geological Sciences, China;Key Laboratory of Network, Institute of Computing Technology, Chinese Academy of Sciences, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 15
Cited 4

Support-Vector Networks

Machine Learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Boosting to correct inductive bias in text classification

Proceedings of the eleventh international conference on Information and knowledge management
Combining Homogeneous Classifiers for Centroid-based Text Classification

ISCC '02 Proceedings of the Seventh International Symposium on Computers and Communications (ISCC'02)
Improving linear classifier for Chinese text categorization

Information Processing and Management: an International Journal
A novel refinement approach for text categorization

Proceedings of the 14th ACM international conference on Information and knowledge management
A Novel Text Classification Algorithm Based on Naïve Bayes and KL-Divergence

PDCAT '05 Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies
Text Classification by Combining Grouping, LSA and kNN

ICIS-COMSAR '06 Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering,Software Architecture and Reuse
Using hypothesis margin to boost centroid text classifier

Proceedings of the 2007 ACM symposium on Applied computing
A novel scheme for domain-transfer problem in the context of sentiment analysis

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
An improved centroid classifier for text categorization

Expert Systems with Applications: An International Journal
An Effective Approach to Enhance Centroid Classifier for Text Categorization

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Expert Systems with Applications: An International Journal
An effective refinement strategy for KNN text classifier

Expert Systems with Applications: An International Journal
Enhanced centroid-based classification technique by filtering outliers

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Using key sentence to improve sentiment classification

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Towards enhancing centroid classifier for text classification-A border-instance approach

Neurocomputing
The impact of preprocessing on text classification

Information Processing and Management: an International Journal
Utilizing global and path information with language modelling for hierarchical text classification

Journal of Information Science

Quantified Score

Hi-index	12.05

Visualization

Abstract

In the community of information retrieval, Centroid Classifier has been showed to be a simple and yet effective method for text categorization. However, it is often plagued with model misfit (or inductive bias) incurred by its assumption. Various methods have been proposed to address this issue, such as Weight Adjustment, Voting, Refinement and DragPushing. However, existing methods employ only one criterion, i.e., training-set error. Researches in machine learning indicate that training-set error based method cannot guarantee the generalization capability of base classifiers for unseen examples. To overcome this problem, we propose a novel Model Adjustment algorithm, which makes use of training-set errors as well as training-set margins. Furthermore, we prove that for a linearly separable problem, proposed method converges to the optimal solution after finite updates using any learning parameter @h(@h0). The empirical assessment conducted on four benchmark collections indicates that proposed method performs slightly better than SVM classifier in prediction accuracy, as well as beats it in running time.