A multi-classifier system for text categorization

Authors:
Shubhamoy Dey
Affiliations:
Indian Institute of Management, Prabandh Shikhar, Rau, Indore, India
Venue:
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
Year:
2011

Citing 17
Cited 0

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization

Text databases & document management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
The use of bigrams to enhance text categorization

Information Processing and Management: an International Journal
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Practical solutions to the problem of diagonal dominance in kernel document clustering

ICML '06 Proceedings of the 23rd international conference on Machine learning
Seeding the survey and analysis of research literature with text mining

Expert Systems with Applications: An International Journal
The Chinese text categorization system with association rule and category priority

Expert Systems with Applications: An International Journal
Text Categorization Based on LDA and SVM

CSSE '08 Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 01
A Fusion of Multiple Classifiers Approach Based on Reliability function for Text Categorization

FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 02
Olex: Effective Rule Learning for Text Categorization

IEEE Transactions on Knowledge and Data Engineering
A Survey on Text Classification Techniques for E-mail Filtering

ICMLC '10 Proceedings of the 2010 Second International Conference on Machine Learning and Computing
A comparative study of TF*IDF, LSI and multi-words for text classification

Expert Systems with Applications: An International Journal
Identifying Themes in Social Media and Detecting Sentiments

ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Using chi-square statistics to measure similarities for text categorization

Expert Systems with Applications: An International Journal
Cross-lingual text categorization: Conquering language boundaries in globalized environments

Information Processing and Management: an International Journal
Information mining - Reflections on recent advancements and the road ahead in data, text, and media mining

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorization, the assignment of text documents to one or more pre-defined categories, is one of the most intensely researched text mining tasks. The task may be subdivided into two main parts: the representation of the text documents by some form of a numerical vector space, and the application of a suitable supervised learning technique. This research is focused on the second part of the problem. The work presented in this paper proposes the construction of a classification model for each of the (pre-defined) categories or themes present in a corpus using a term-frequency based 'keyword' identification and document scoring technique. The documents misclassified by each of these (category-specific) classifier models are then re-classified with the help of the other models. The effectiveness of the approach is demonstrated by experiments on two publicly available BBC News corpuses. Good classification accuracy is observed for each of the two corpuses. Specifically, the macro-averaged and micro-averaged F-measures of the proposed method (on evaluation the dataset) for the BBC Sports corpus are 94.7% and 94.3% respectively.