An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Combining feature selectors for text classification
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Bilingual topic aspect classification with a few training examples
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Support vector machines for credit scoring and discovery of significant features
Expert Systems with Applications: An International Journal
Efficient model selection for large-scale nearest-neighbor data mining
BNCOD'10 Proceedings of the 27th British national conference on Data Security and Security Data
Hi-index | 0.00 |
We consider the relationship between training set size and the parameter k for the k-Nearest Neighbors (kNN) classifier. When few examples are available, we observe that accuracy is sensitive to k and that best k tends to increase with training size. We explore the subsequent risk that k tuned on partitions will be suboptimal after aggregation and re-training. This risk is found to be most severe when little data is available. For larger training sizes, accuracy becomes increasingly stable with respect to k and the risk decreases.