An analysis of the coupling between training set and neighborhood sizes for the kNN classifier

Authors:
J. Scott Olsson
Affiliations:
University of Maryland, College Park, MD
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 3
Cited 4

An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research

Combining feature selectors for text classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Bilingual topic aspect classification with a few training examples

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Support vector machines for credit scoring and discovery of significant features

Expert Systems with Applications: An International Journal
Efficient model selection for large-scale nearest-neighbor data mining

BNCOD'10 Proceedings of the 27th British national conference on Data Security and Security Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the relationship between training set size and the parameter k for the k-Nearest Neighbors (kNN) classifier. When few examples are available, we observe that accuracy is sensitive to k and that best k tends to increase with training size. We explore the subsequent risk that k tuned on partitions will be suboptimal after aggregation and re-training. This risk is found to be most severe when little data is available. For larger training sizes, accuracy becomes increasingly stable with respect to k and the risk decreases.