A negative category based approach for Wikipedia document classification

Authors:
Meenakshi Sundaram Murugeshan;K. Lakshmi;Saswati Mukherjee
Affiliations:
Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai-600025, India.;Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai-600025, India.;Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai-600025, India
Venue:
International Journal of Knowledge Engineering and Data Mining
Year:
2010

Citing 10
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
A comprehensive comparative study on term weighting schemes for text categorization with support vector machines

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
An Improved Feature Selection using Maximized Signal to Noise Ratio Technique for TC

ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Text similarity: an alternative way to search MEDLINE

Bioinformatics
Report on the XML mining track at INEX 2007 categorization and clustering of XML documents

ACM SIGIR Forum
Distributional Features for Text Categorization

IEEE Transactions on Knowledge and Data Engineering
Proposing a new term weighting scheme for text categorization

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Profile based methods have been successfully used for the classification of unstructured texts. This paper presents a profile based method for Wikipedia XML document classification. We have used profiles built using negative category information. Our approach exploits the structure of Wikipedia documents to build profiles. Two class profiles are built; one based on the whole content and the other based on the initial description of the Wikipedia documents. In addition, we have also explored the option of using the terms in the section and subsection titles. The effectiveness of cosine and fractional similarity measures in classifying XML documents is compared. The importance of combining two profile based classifiers is experimentally shown to have worked better than individual classifiers.