Bumptrees for efficient function, constraint, and classification learning
NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Fast discovery of association rules
Advances in knowledge discovery and data mining
Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Very fast EM-based mixture model clustering using multiresolution kd-trees
Proceedings of the 1998 conference on Advances in neural information processing systems II
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
Efficient Locally Weighted Polynomial Regression Predictions
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient Regular Data Structures and Algorithms for Location and Proximity Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
Multiresolution instance-based learning
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
On autonomous k-means clustering
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Query-driven iterated neighborhood graph search for large scale indexing
Proceedings of the 20th ACM international conference on Multimedia
Posterior Expectation of Regularly Paved Random Histograms
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special Issue on Monte Carlo Methods in Statistics
CID: an efficient complexity-invariant distance for time series
Data Mining and Knowledge Discovery
Using Non-Zero Dimensions for the Cosine and Tanimoto Similarity Search Among Real Valued Vectors
Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday
Hi-index | 0.00 |
This paper is about metric data structures in high-dimensional or non-Euclidean space that permit cached sufficient statistics accelerations of learning algorithms. It has recently been shown that for less than about 10 dimensions, decorating kd-trees with additional "cached sufficient statistics" such as first and second moments and contingency tables can provide satisfying acceleration for a very wide range of statistical learning tasks such as kernel regression, locally weighted regression, k-means clustering, mixture modeling and Bayes Net learning. In this paper, we begin by defining the anchors hierarchy--a fast data structure and algorithm for localizing data based only on a triangle-inequality-obeying distance metric. We show how this, in its own right, gives a fast and effective clustering of data. But more importantly we show how it can produce a well-balanced structure similar to a Ball-Tree (Omohundro, 1991) or a kind of metric tree (Uhlmann, 1991; Ciaccia, Patella, & Zezula, 1997) in a way that is neither "topdown" nor "bottom-up" but instead "middleout". We then show how this structure, decorated with cached sufficient statistics, allows a wide variety of statistical learning algorithms to be accelerated even in thousands of dimensions.