A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

Authors:
Chin Heng Wan;Lam Hong Lee;Rajprasad Rajkumar;Dino Isa
Affiliations:
Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, 31900 Kampar, Perak, Malaysia;Intelligent Systems Research Group, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia;Intelligent Systems Research Group, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia;Intelligent Systems Research Group, Faculty of Engineering, The University of Nottingham, Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor, Malaysia
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 31
Cited 5

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Making large-scale support vector machine learning practical

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Effective Methods for Improving Naive Bayes Text Classifiers

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
A Multilingual Text Mining Approach Based on Self-Organizing Maps

Applied Intelligence
On Machine Learning Methods for Chinese Document Categorization

Applied Intelligence
Authorship Attribution with Support Vector Machines

Applied Intelligence
Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)

Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)
Fast and accurate text classification via multiple linear discriminant projections

The VLDB Journal — The International Journal on Very Large Data Bases
Spam filters: bayes vs. chi-squared; letters vs. words

ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Query dependent ranking using K-nearest neighbor

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Text categorization via generalized discriminant analysis

Information Processing and Management: an International Journal
Semi-supervised Classification from Discriminative Random Walks

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine

IEEE Transactions on Knowledge and Data Engineering
Estimation of individual prediction reliability using the local sensitivity analysis

Applied Intelligence
Text classification from unlabeled documents with bootstrapping and feature projection techniques

Information Processing and Management: an International Journal
Feature selection for text classification with Naïve Bayes

Expert Systems with Applications: An International Journal
Distributional Features for Text Categorization

IEEE Transactions on Knowledge and Data Engineering
Using the self organizing map for clustering of text documents

Expert Systems with Applications: An International Journal
A new maximal-margin spherical-structured multi-class support vector machine

Applied Intelligence
Automatically computed document dependent weighting factor facility for Naïve Bayes classification

Expert Systems with Applications: An International Journal
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Expert Systems with Applications: An International Journal
Automatic folder allocation system using Bayesian-support vector machines hybrid classification approach

Applied Intelligence
LDA/SVM driven nearest neighbor classification

IEEE Transactions on Neural Networks
An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

Applied Intelligence

Oil and gas pipeline failure prediction system using long range ultrasonic transducers and Euclidean-Support Vector Machines classification approach

Expert Systems with Applications: An International Journal
The decomposed k-nearest neighbor algorithm for imbalanced text classification

FGIT'12 Proceedings of the 4th international conference on Future Generation Information Technology
Class-indexing-based term weighting for automatic text classification

Information Sciences: an International Journal
Automated crime report analysis and classification for e-government and decision support

Proceedings of the 14th Annual International Conference on Digital Government Research
Global geometric similarity scheme for feature selection in fault diagnosis

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

This work implements a new text document classifier by integrating the K-nearest neighbor (KNN) classification approach with the support vector machine (SVM) training algorithm. The proposed Nearest Neighbor-Support Vector Machine hybrid classification approach is coined as SVM-NN. The KNN has been reported as one of the widely used text classification approaches due to its simplicity and efficiency in handling various types of text classification tasks. However, there exists a major problem of the KNN in determining the appropriate value for parameter K in order to guarantee high classification effectiveness. This is due to the fact that the selection of the value of parameter K has high impact on the accuracy of the KNN classifier. Other than determining the optimal value of parameter K, the KNN is also a lazy learning method which keeps the entire training samples until classification time. Hence, the computational process of the KNN has become intensive when the value of parameter K increases. In this paper, we propose the SVM-NN hybrid classification approach with the objective that to minimize the impact of parameter on classification accuracy. In the training stage, the SVM is utilized to reduce the training samples for each of the available categories to their support vectors (SVs). The SVs from different categories are used as the training data of nearest neighbor classification algorithm in which the Euclidean distance function is used to calculate the average distance between the testing data point to each set of SVs of different categories. The classification decision is made based on the category which has the shortest average distance between its SVs and the testing data point. The experiments on several benchmark text datasets show that the classification accuracy of the SVM-NN approach has low impact on the value of parameter, as compared to the conventional KNN classification model.