A class-feature-centroid classifier for text categorization

Authors:
Hu Guan;Jingyu Zhou;Minyi Guo
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China
Venue:
Proceedings of the 18th international conference on World wide web
Year:
2009

Citing 33
Cited 19

GroupLens: an open architecture for collaborative filtering of netnews

CSCW '94 Proceedings of the 1994 ACM conference on Computer supported cooperative work
Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms for bigram and trigram word clustering

Speech Communication
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text categorization for multi-page documents: a hybrid naive Bayes HMM approach

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Boosting to correct inductive bias in text classification

Proceedings of the eleventh international conference on Information and knowledge management
Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization

Information Retrieval
Maximizing Text-Mining Performance

IEEE Intelligent Systems
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A refinement approach to handling model misfit in text categorization

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining Homogeneous Classifiers for Centroid-based Text Classification

ISCC '02 Proceedings of the Seventh International Symposium on Computers and Communications (ISCC'02)
A Comparative Study of Centroid-Based, Neighborhood-Based and Statistical Approaches for Effective Document Categorization

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Effect of term distributions on centroid-based text categorization

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
An Empirical Study of Feature Selection for Text Categorization based on Term Weightage

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Dimension Reduction in Text Classification with Support Vector Machines

The Journal of Machine Learning Research
Scoring and Selecting Terms for Text Categorization

IEEE Intelligent Systems
Information gain and divergence-based feature selection for machine learning-based text categorization

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
A maximal figure-of-merit (MFoM)-learning approach to robust classifier design for text categorization

ACM Transactions on Information Systems (TOIS)
Large margin DragPushing strategy for centroid text categorization

Expert Systems with Applications: An International Journal
Exploring in the weblog space by detecting informative and affective articles

Proceedings of the 16th international conference on World Wide Web
Using hypothesis margin to boost centroid text classifier

Proceedings of the 2007 ACM symposium on Applied computing
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An improved centroid classifier for text categorization

Expert Systems with Applications: An International Journal
Deep classification in large-scale text hierarchies

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
An Improvement of Centroid-Based Classification Algorithm for Text Classification

ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Feature selection strategies for text categorization

AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
A neural network model for hierarchical multilingual text categorization

ISNN'05 Proceedings of the Second international conference on Advances in neural networks - Volume Part II
Weighted average pointwise mutual information for feature selection in text categorization

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Multinomial naive bayes for text categorization revisited

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Class normalization in centroid-based text categorization

Information Sciences: an International Journal

Combining global and local information for enhanced deep classification

Proceedings of the 2010 ACM Symposium on Applied Computing
Objectivity classification in online media

Proceedings of the 21st ACM conference on Hypertext and hypermedia
Prototype hierarchy based clustering for the categorization and navigation of web collections

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Analysis of structural relationships for hierarchical cluster labeling

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Fast dimension reduction for document classification based on imprecise spectrum analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Stylometric features for emotion level classification in news related blogs

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Design and implementation of contextual information portals

Proceedings of the 20th international conference companion on World wide web
A subspace decision cluster classifier for text classification

Expert Systems with Applications: An International Journal
Word clouds for efficient document labeling

DS'11 Proceedings of the 14th international conference on Discovery science
Enhancing text classification by information embedded in the test set

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Macro features based text categorization

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Athena: text mining based discovery of scientific workflows in disperse repositories

RED'10 Proceedings of the Third international conference on Resource Discovery
Harnessing NLP techniques in the processes of multilingual content management

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
A high performance centroid-based classification approach for language identification

Pattern Recognition Letters
Towards enhancing centroid classifier for text classification-A border-instance approach

Neurocomputing
Fast dimension reduction for document classification based on imprecise spectrum analysis

Information Sciences: an International Journal
Live and learn from mistakes: A lightweight system for document classification

Information Processing and Management: an International Journal
Text classification by aggregation of SVD eigenvectors

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Theme word subspace method for text document categorization

DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automated text categorization is an important technique for many web applications, such as document indexing, document filtering, and cataloging web resources. Many different approaches have been proposed for the automated text categorization problem. Among them, centroid-based approaches have the advantages of short training time and testing time due to its computational efficiency. As a result, centroid-based classifiers have been widely used in many web applications. However, the accuracy of centroid-based classifiers is inferior to SVM, mainly because centroids found during construction are far from perfect locations. We design a fast Class-Feature-Centroid (CFC) classifier for multi-class, single-label text categorization. In CFC, a centroid is built from two important class distributions: inter-class term index and inner-class term index. CFC proposes a novel combination of these indices and employs a denormalized cosine measure to calculate the similarity score between a text vector and a centroid. Experiments on the Reuters-21578 corpus and 20-newsgroup email collection show that CFC consistently outperforms the state-of-the-art SVM classifiers on both micro-F1 and macro-F1 scores. Particularly, CFC is more effective and robust than SVM when data is sparse.