Feature annotation for text categorization

Authors:
Yashodhara Haribhakta;Santosh Kalamkar;Parag Kulkarni
Affiliations:
College of Engineering, Pune Shivajinagar, Pune, Maharashtra, India;College of Engineering, Pune Shivajinagar, Pune, Maharashtra, India;College of Engineering, Pune Shivajinagar, Pune Maharashtra, India
Venue:
Proceedings of the CUBE International Information Technology Conference
Year:
2012

Citing 11
Cited 0

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
Developing Reusable and Robust Language Processing Components for Information Systems using GATE

DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications
An introduction to variable and feature selection

The Journal of Machine Learning Research
Semantic Feature Selection Using WordNet

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Data Mining and Knowledge Discovery Handbook

Data Mining and Knowledge Discovery Handbook
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Applying RDF Ontologies to Improve Text Classification

CINC '09 Proceedings of the 2009 International Conference on Computational Intelligence and Natural Computing - Volume 02
Automatic text categorization based on content analysis with cognitive situation models

Information Sciences: an International Journal
Keyword Combination Extraction in Text Categorization Based on Ant Colony Optimization

SOCPAR '09 Proceedings of the 2009 International Conference of Soft Computing and Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In text categorization, feature extraction is one of the major strategies that aim at making text classifiers more efficient and accurate. Selecting quickly a suitable strategy for feature extraction out of many strategies proposed by previous studies is difficult. In this paper, we propose an efficient entity extraction approach for feature extraction which contributes towards accurate text categorization. In the proposed approach the entities identified are person name, organization name, location and date. We have used the GATE tool for extraction of these entities. After the entities are identified we have annotated each of these entities in the original text with parameters. There are three measures used for feature selection, term frequency (TF), information gain (IG) and chi-square (χ2). The effectiveness and accuracy of the entity annotated features is judged by using these features for classification and comparing the results against the non-annotated features. The experimentation is performed on standard benchmarking datasets such as NFS Abstract datasets and Reuters-21578. The experimental results predict that the accuracy of text categorization using the annotated features is better for NFS Abstract-Title dataset as compared to non-annotated features. For Reuters-21578, however, there wasn't a significant improvement in accuracy of classification.