Thinking (vol. 3)
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Representation and learning in information retrieval
Representation and learning in information retrieval
Information extraction as a basis for high-precision text classification
ACM Transactions on Information Systems (TOIS)
NEWPAR: an automatic feature selection and weighting schema for category ranking
Proceedings of the 2006 ACM symposium on Document engineering
Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
Proceedings of the 6th Balkan Conference in Informatics
Hi-index | 0.00 |
Typical text classifiers learn from example and training documents that have been manually categorized. In this research, our experiment dealt with the classification of news wire articles using category profiles. We built these profiles by selecting feature words and phrases from the training documents. For our experiments we decided on using the text corpus Reuters-21578. We used precision and recall to measure the effectiveness of our classifier. Though our experiments with words yielded good results, we found instances where the phrase-based approach produced more effectiveness. This could be due to the fact that when a word along with its adjoining word - a phrase - is considered towards building a category profile, it could be a good discriminator. This tight packaging of word pairs could bring in some semantic value. The packing of word pairs also filters out words occurring frequently in isolation that do not bear much weight towards characterizing that category.