Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine

Authors:
Dino Isa;Lam H. Lee;V. P. Kallimani;R. RajKumar
Affiliations:
The University of Nottingham, Malaysia Campus, Semenyih;University of Nottingham, Semenyih;University of Nottingham, Malaysia Campus, Semenyih;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2008

Citing 0
Cited 18

Accessing Positive and Negative Online Opinions

UAHCI '09 Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction. Part III: Applications and Services
Symbolic representation of text documents

Proceedings of the Third Annual ACM Bangalore Conference
Automatically computed document dependent weighting factor facility for Naïve Bayes classification

Expert Systems with Applications: An International Journal
The forecasting model based on modified SVRM and PSO penalizing Gaussian noise

Expert Systems with Applications: An International Journal
Cluster based symbolic representation and feature selection for text classification

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
A clustering study of a 7000 EU document inventory using MDS and SOM

Expert Systems with Applications: An International Journal
A symbolic approach for text classification based on dissimilarity measure

Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia
Dissimilarity based feature selection for text classification: a cluster based approach

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Expert Systems with Applications: An International Journal
Automatic folder allocation system using Bayesian-support vector machines hybrid classification approach

Applied Intelligence
A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

Expert Systems with Applications: An International Journal
An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

Applied Intelligence
Text classification using symbolic similarity measure

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
An empirical study on various text classifiers

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Oil and gas pipeline failure prediction system using long range ultrasonic transducers and Euclidean-Support Vector Machines classification approach

Expert Systems with Applications: An International Journal
An efficient classification approach for large-scale mobile ubiquitous computing

Information Sciences: an International Journal
A GA-based model selection for smooth twin parametric-margin support vector machine

Pattern Recognition
Nonparallel hyperplane support vector machine for binary classification problems

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

This work implements an enhanced hybrid classification method through the utilization of the naïve Bayes classifier and the Support Vector Machine (SVM). In this project, the Bayes formula was used to vectorize (as opposed to classify) a document according to a probability distribution reflecting the probable categories that the document may belong to. The Bayes formula gives a range of probabilities to which the document can be assigned according to a pre determined set of topics such as those found in the "20 newsgroups" dataset for instance. Using this probability distribution as the vectors to represent the document, the SVM can then be used to classify the documents on a multi - dimensional level. The effects of an inadvertent dimensionality reduction caused by classifying using only the highest probability using the naïve Bayes classifier can be overcome using the SVM by employing all the probability values associated with every category for each document. This method can be used for any dataset and shows a significant reduction in training time as compared to the LSquare method and significant improvement in classification accuracy when compared to pure naïve Bayes systems and also the TF-IDF/SVM hybrids.