Text Categorization: An Experiment Using Phrases

Authors:
Madhusudhan Kongovi;Juan Carlos Guzman;Venu Dasigi
Affiliations:
-;-;-
Venue:
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Year:
2002

Citing 4
Cited 3

Categorization

Thinking (vol. 3)
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Representation and learning in information retrieval

Representation and learning in information retrieval
Information extraction as a basis for high-precision text classification

ACM Transactions on Information Systems (TOIS)

NEWPAR: an automatic feature selection and weighting schema for category ranking

Proceedings of the 2006 ACM symposium on Document engineering
A Hierarchical Concept-matrix Patterned Multi-Agent Based Automated Text Classification Method for Digital Libraries

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
The complexity of text based computer-mediated communication (CMC): it is impossible to create a new view for analysis of asynchronous discussions

Proceedings of the 6th Balkan Conference in Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Typical text classifiers learn from example and training documents that have been manually categorized. In this research, our experiment dealt with the classification of news wire articles using category profiles. We built these profiles by selecting feature words and phrases from the training documents. For our experiments we decided on using the text corpus Reuters-21578. We used precision and recall to measure the effectiveness of our classifier. Though our experiments with words yielded good results, we found instances where the phrase-based approach produced more effectiveness. This could be due to the fact that when a word along with its adjoining word - a phrase - is considered towards building a category profile, it could be a good discriminator. This tight packaging of word pairs could bring in some semantic value. The packing of word pairs also filters out words occurring frequently in isolation that do not bear much weight towards characterizing that category.