An alternative approach for statistical single-label document classification of newspaper articles

Authors:
Georgios Mamakis;Athanasios G. Malamos;J. Andrew Ware
Affiliations:
Technological Educational Institute of Crete, Greeceand Department of Computing and Mathematical Sciences, University of Glamorgan,Wales, UK;Technological Educational Institute of Crete, Greece,;Department of Computing and Mathematical Sciences, Universityof Glamorgan, Wales, UK
Venue:
Journal of Information Science
Year:
2011

Citing 10
Cited 2

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Biterm language models for document retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Augmenting Naive Bayes Classifiers with Statistical Language Models

Information Retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Improving Text Summarization Using Noun Retrieval Techniques

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Transferring naive bayes classifiers for text classification

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Improving word sense disambiguation in lexical chaining

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A parametric methodology for text classification

Journal of Information Science

A fuzzy conceptualization model for text mining with application in opinion polarity classification

Knowledge-Based Systems
Automated crime report analysis and classification for e-government and decision support

Proceedings of the 14th Annual International Conference on Digital Government Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classification is one of the most important sectors of machine learning theory. It enables a series of tasks among which are email spam filtering and context identification. Classification theory proposes a number of different techniques based on different technologies and tools. Classification systems are typically distinguished into single-label categorization and multi-label categorization systems, according to the number of categories they assign to each of the classified documents. In this paper, we present work undertaken in the area of single-label classification which resulted in a statistical classifier, based on the Naive Bayes assumption of statistical independence of word occurrence across a document. Our algorithm, takes into account cross-category word occurrence in deciding the class of a random document. Moreover, instead of estimating word co-occurrence in assigning a class, we estimate word contribution for a document to belong in a class. This approach outperforms other statistical classifiers as Naive Bayes Classifier and Language Models, as proven in our results.