GroupLens: an open architecture for collaborative filtering of netnews
CSCW '94 Proceedings of the 1994 ACM conference on Computer supported cooperative work
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms for bigram and trigram word clustering
Speech Communication
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text categorization for multi-page documents: a hybrid naive Bayes HMM approach
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Boosting to correct inductive bias in text classification
Proceedings of the eleventh international conference on Information and knowledge management
Maximizing Text-Mining Performance
IEEE Intelligent Systems
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Detecting Concept Drift with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A refinement approach to handling model misfit in text categorization
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining Homogeneous Classifiers for Centroid-based Text Classification
ISCC '02 Proceedings of the Seventh International Symposium on Computers and Communications (ISCC'02)
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Effect of term distributions on centroid-based text categorization
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
An Empirical Study of Feature Selection for Text Categorization based on Term Weightage
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Dimension Reduction in Text Classification with Support Vector Machines
The Journal of Machine Learning Research
Scoring and Selecting Terms for Text Categorization
IEEE Intelligent Systems
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
ACM Transactions on Information Systems (TOIS)
Large margin DragPushing strategy for centroid text categorization
Expert Systems with Applications: An International Journal
Exploring in the weblog space by detecting informative and affective articles
Proceedings of the 16th international conference on World Wide Web
Using hypothesis margin to boost centroid text classifier
Proceedings of the 2007 ACM symposium on Applied computing
Robust classification of rare queries using web knowledge
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An improved centroid classifier for text categorization
Expert Systems with Applications: An International Journal
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
An Improvement of Centroid-Based Classification Algorithm for Text Classification
ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Feature selection strategies for text categorization
AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
A neural network model for hierarchical multilingual text categorization
ISNN'05 Proceedings of the Second international conference on Advances in neural networks - Volume Part II
Weighted average pointwise mutual information for feature selection in text categorization
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Multinomial naive bayes for text categorization revisited
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Class normalization in centroid-based text categorization
Information Sciences: an International Journal
Combining global and local information for enhanced deep classification
Proceedings of the 2010 ACM Symposium on Applied Computing
Objectivity classification in online media
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Prototype hierarchy based clustering for the categorization and navigation of web collections
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Analysis of structural relationships for hierarchical cluster labeling
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Fast dimension reduction for document classification based on imprecise spectrum analysis
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Stylometric features for emotion level classification in news related blogs
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Design and implementation of contextual information portals
Proceedings of the 20th international conference companion on World wide web
A subspace decision cluster classifier for text classification
Expert Systems with Applications: An International Journal
Word clouds for efficient document labeling
DS'11 Proceedings of the 14th international conference on Discovery science
Enhancing text classification by information embedded in the test set
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Macro features based text categorization
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Athena: text mining based discovery of scientific workflows in disperse repositories
RED'10 Proceedings of the Third international conference on Resource Discovery
Harnessing NLP techniques in the processes of multilingual content management
EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
A high performance centroid-based classification approach for language identification
Pattern Recognition Letters
Fast dimension reduction for document classification based on imprecise spectrum analysis
Information Sciences: an International Journal
Live and learn from mistakes: A lightweight system for document classification
Information Processing and Management: an International Journal
Text classification by aggregation of SVD eigenvectors
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Theme word subspace method for text document categorization
DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop
Hi-index | 0.00 |
Automated text categorization is an important technique for many web applications, such as document indexing, document filtering, and cataloging web resources. Many different approaches have been proposed for the automated text categorization problem. Among them, centroid-based approaches have the advantages of short training time and testing time due to its computational efficiency. As a result, centroid-based classifiers have been widely used in many web applications. However, the accuracy of centroid-based classifiers is inferior to SVM, mainly because centroids found during construction are far from perfect locations. We design a fast Class-Feature-Centroid (CFC) classifier for multi-class, single-label text categorization. In CFC, a centroid is built from two important class distributions: inter-class term index and inner-class term index. CFC proposes a novel combination of these indices and employs a denormalized cosine measure to calculate the similarity score between a text vector and a centroid. Experiments on the Reuters-21578 corpus and 20-newsgroup email collection show that CFC consistently outperforms the state-of-the-art SVM classifiers on both micro-F1 and macro-F1 scores. Particularly, CFC is more effective and robust than SVM when data is sparse.