The decomposed k-nearest neighbor algorithm for imbalanced text classification

Authors:
Hyung-Seok Kang;Kihyo Nam;Seong-in Kim
Affiliations:
Division of Industrial Management Engineering, Korea University, Seoul, Republic of Korea;UMLogics Co., Ltd., Seongnam-city, Kyungki-do, Republic of Korea;Division of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
Venue:
FGIT'12 Proceedings of the 4th international conference on Future Generation Information Technology
Year:
2012

Citing 17
Cited 0

Improving text categorization methods for event tracking

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
An adaptive k-nearest neighbor text categorization strategy

ACM Transactions on Asian Language Information Processing (TALIP)
The class imbalance problem: A systematic study

Intelligent Data Analysis
Sentiment classification of online reviews to travel destinations by supervised machine learning approaches

Expert Systems with Applications: An International Journal
Feature selection for text classification with Naïve Bayes

Expert Systems with Applications: An International Journal
Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Expert Systems with Applications: An International Journal
An effective refinement strategy for KNN text classifier

Expert Systems with Applications: An International Journal
A hybrid classification method of k nearest neighbor, Bayesian methods and genetic algorithm

Expert Systems with Applications: An International Journal
Exploiting probabilistic topic models to improve text categorization under class imbalance

Information Processing and Management: an International Journal
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Expert Systems with Applications: An International Journal
An improved K-nearest-neighbor algorithm for text categorization

Expert Systems with Applications: An International Journal
A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

Expert Systems with Applications: An International Journal
An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

As textual data have exponentially increased, it is focused that a need for automatic classification of relevant data to one of pre-defined classes. In many practical applications, they assume that training data are evenly distributed among all classes, but they are suffered from an imbalanced problem. Several algorithms and re-sampling methods have been proposed to overcome an imbalanced problem, but they are still facing the overfitting and information missing. This paper proposes the Decomposed K-Nearest Neighbor (DCM-KNN). In training step, the DCM-KNN decomposes training data into misclassified and correctly-classified data set based on the result of traditional KNN, and finds the appropriate KNN for each set. In test step, the DCM-KNN estimates whether test data is similar to misclassified and correctly-classified data set, and applies the appropriate KNNs. Experimental results show that proposed algorithm can achieve more accurate results in an imbalanced condition.