Discriminant Adaptive Nearest Neighbor Classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Efficient algorithms for decision tree cross-validation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Complete Cross-Validation for Nearest Neighbor Classifiers
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient Cross-Validation in ILP
ILP '01 Proceedings of the 11th International Conference on Inductive Logic Programming
Scalable collaborative filtering using cluster-based smoothing
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
On Visualization and Aggregation of Nearest Neighbor Classifiers
IEEE Transactions on Pattern Analysis and Machine Intelligence
An analysis of the coupling between training set and neighborhood sizes for the kNN classifier
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
Top 10 algorithms in data mining
Knowledge and Information Systems
Empirical evaluation of the difficulty of finding a good value of k for the nearest neighbor
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Hi-index | 0.00 |
One of the most widely used models for large-scale data mining is the k-nearest neighbor (k-nn) algorithm. It can be used for classification, regression, density estimation, and information retrieval. To use k-nn, a practitioner must first choose k, usually selecting the k with the minimal loss estimated by cross-validation. In this work, we begin with an existing but little-studied method that greatly accelerates the cross-validation process for selecting k from a range of user-provided possibilities. The result is that a much larger range of k values may be examined more quickly. Next, we extend this algorithm with an additional optimization to provide improved performance for locally linear regression problems. We also show how this method can be applied to automatically select the range of k values when the user has no a priori knowledge of appropriate bounds. Furthermore, we apply statistical methods to reduce the number of examples examined while still finding a likely best k, greatly improving performance for large data sets. Finally, we present both analytical and experimental results that demonstrate these benefits.