Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

Authors:
Eui-Hong Han;George Karypis;Vipin Kumar
Affiliations:
-;-;-
Venue:
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Year:
2001

Citing 14
Cited 23

Algorithms for clustering data

Algorithms for clustering data
Search in Artificial Intelligence

Search in Artificial Intelligence
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
C4.5: programs for machine learning

C4.5: programs for machine learning
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity metric learning for a variable-kernel classifier

Neural Computation
A Review and Empirical Evaluation of Feature Weighting Methods for aClass of Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document Categorization and Query Generation on the World Wide WebUsing WebACE

Artificial Intelligence Review - Special issue on data mining on the Internet
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)

Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)

Extracting User Profiles from E-mails Using the Set-Oriented Classifier

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Multi-attribute Text Classification Using the Fuzzy Borda Method and Semantic Grades

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
IKNN: Informative K-Nearest Neighbor Pattern Classification

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
An Empirical Study of Category Skew on Feature Selection for Text Categorization

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Chat mining: Automatically determination of chat conversations' topic in Turkish text based chat mediums

Expert Systems with Applications: An International Journal
Acquiring semantic context for events from online resources

Proceedings of the 3rd International Workshop on Location and the Web
Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet

Expert Systems with Applications: An International Journal
k-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A new nearest neighbor rule for text categorization

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
An incremental document clustering for the large document database

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
An adaptive fuzzy kNN text classifier

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part III
On text mining algorithms for automated maintenance of hierarchical knowledge directory

KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
Class normalization in centroid-based text categorization

Information Sciences: an International Journal
An incremental document clustering algorithm based on a hierarchical agglomerative approach

ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Dense Neighborhoods on Affinity Graph

International Journal of Computer Vision
Using the overlapping community structure of a network of tags to improve text clustering

Proceedings of the 23rd ACM conference on Hypertext and social media
Examining text categorization methods for incidents analysis

PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
Improving nearest neighbor classification using particle swarm optimization with novel fitness function

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part II
Categorical proportional difference: a feature selection method for text categorization

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy

Expert Systems with Applications: An International Journal
A comparison between k-Optimum Path Forest and k-Nearest Neighbors supervised classifiers

Pattern Recognition Letters
Explaining data-driven document classifications

MIS Quarterly

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, attribute dependency, and multi-modality of categories. Existing classification techniques have limited applicability in the data sets of these natures. In this paper, we present a Weight Adjusted k-Nearest Neighbor (WAKNN) classification that learns feature weights based on a greedy hill climbing technique. We also present two performance optimizations of WAKNN that improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality. We experimentally evaluated WAKNN on 52 document data sets from a variety of domains and compared its performance against several classification algorithms, such as C4.5, RIPPER, Naive-Bayesian, PEBLS and VSM. Experimental results on these data sets confirm that WAKNN consistently outperforms other existing classification algorithms.