An improved K-nearest-neighbor algorithm for text categorization

Authors:
Shengyi Jiang;Guansong Pang;Meiling Wu;Limin Kuang
Affiliations:
School of Informatics, Guangdong University of Foreign Studies, 510420 Guangzhou, China;School of Informatics, Guangdong University of Foreign Studies, 510420 Guangzhou, China;School of Informatics, Guangdong University of Foreign Studies, 510420 Guangzhou, China;School of Informatics, Guangdong University of Foreign Studies, 510420 Guangzhou, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 20
Cited 5

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Data clustering: a review

ACM Computing Surveys (CSUR)
A vector space model for automatic indexing

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Efficient k-NN search on vertically decomposed data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
Array-index: a plug&search K nearest neighbors method for high-dimensional data

Data & Knowledge Engineering
A clustering-based method for unsupervised intrusion detections

Pattern Recognition Letters
Hierarchically SVM classification based on support vector clustering method and its application to document categorization

Expert Systems with Applications: An International Journal
Learning to classify e-mail

Information Sciences: an International Journal
Boosting multi-label hierarchical text categorization

Information Retrieval
An improved centroid classifier for text categorization

Expert Systems with Applications: An International Journal
Sentiment classification of online reviews to travel destinations by supervised machine learning approaches

Expert Systems with Applications: An International Journal
A survey of learning-based techniques of email spam filtering

Artificial Intelligence Review
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Expert Systems with Applications: An International Journal
Naive bayes for text classification with unbalanced classes

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

ICE - Intelligent Clustering Engine: A clustering gadget for Google Desktop

Expert Systems with Applications: An International Journal
The decomposed k-nearest neighbor algorithm for imbalanced text classification

FGIT'12 Proceedings of the 4th international conference on Future Generation Information Technology
Fault diagnosis of rolling element bearing with intrinsic mode function of acoustic emission data using APF-KNN

Expert Systems with Applications: An International Journal
An effective class-centroid-based dimension reduction method for text classification

Proceedings of the 22nd international conference on World Wide Web companion
Projected-prototype based classifier for text categorization

Knowledge-Based Systems

Quantified Score

Hi-index	12.05

Visualization

Abstract

Text categorization is a significant tool to manage and organize the surging text data. Many text categorization algorithms have been explored in previous literatures, such as KNN, Naive Bayes and Support Vector Machine. KNN text categorization is an effective but less efficient classification method. In this paper, we propose an improved KNN algorithm for text categorization, which builds the classification model by combining constrained one pass clustering algorithm and KNN text categorization. Empirical results on three benchmark corpora show that our algorithm can reduce the text similarity computation substantially and outperform the-state-of-the-art KNN, Naive Bayes and Support Vector Machine classifiers. In addition, the classification model constructed by the proposed algorithm can be updated incrementally, and it has great scalability in many real-word applications.