New techniques for exact and approximate dynamic closest-point problems
SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Incremental updates of inverted lists for text document retrieval
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
VisualSEEk: a fully automated content-based image query system
MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Multidimensional access methods
ACM Computing Surveys (CSUR)
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
Indexing the edges—a simple and yet efficient approach to high-dimensional indexing
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the geometry of similarity search: dimensionality curse and concentration of measure
Information Processing Letters
The Quadtree and Related Hierarchical Data Structures
ACM Computing Surveys (CSUR)
Multidimensional binary search trees used for associative searching
Communications of the ACM
ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Clustering for Approximate Similarity Search in High-Dimensional Spaces
IEEE Transactions on Knowledge and Data Engineering
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Approximate Nearest Neighbor Searching in Multimedia Databases
Proceedings of the 17th International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing
Proceedings of the 27th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Approximate similarity retrieval with M-trees
The VLDB Journal — The International Journal on Very Large Data Bases
Extracting Spatial Knowledge from the Web
SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
Searching in Metric Spaces by Spatial Approximation
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Efficient similarity search and classification via rank aggregation
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A Sampling-Based Estimator for Top-k Query
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Navigating massive data sets via local clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate searches: k-neighbors + precision
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Scaling distributional similarity to large corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The privacy of k-NN retrieval for horizontal partitioned data: new methods and applications
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
A posteriori multi-probe locality sensitive hashing
MM '08 Proceedings of the 16th ACM international conference on Multimedia
TagScore: Approximate Similarity Using Tag Synopses
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Nowhere to Hide: Finding Plagiarized Documents Based on Sentence Similarity
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Bounded coordinate system indexing for real-time video clip search
ACM Transactions on Information Systems (TOIS)
Quality and efficiency in high dimensional nearest neighbor search
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A partial-order based active cache for recommender systems
Proceedings of the third ACM conference on Recommender systems
A flexible framework to ease nearest neighbor search in multidimensional data spaces
Data & Knowledge Engineering
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space
ACM Transactions on Database Systems (TODS)
Finding maximum degrees in hidden bipartite graphs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Active caching for similarity queries based on shared-neighbor information
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Techniques for similarity searching in multimedia databases
Proceedings of the VLDB Endowment
Effective data co-reduction for multimedia similarity search
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Face retrieval in broadcasting news video by fusing temporal and intensity information
CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
A set correlation model for partitional clustering
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A segmental non-parametric-based phoneme recognition approach at the acoustical level
Computer Speech and Language
Exact and approximate algorithms for the most connected vertex problem
ACM Transactions on Database Systems (TODS)
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Annotation propagation in image databases using similarity graphs
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Hi-index | 0.00 |
This paper introduces a practical index for approximate similarity queries of large multi-dimensional data sets: the spatial approximation sample hierarchy (SASH). A SASH is a multi-level structure of random samples, recursively constructed by building a SASH on a large randomly selected sample of data objects, and then connecting each remaining object to several of their approximate nearest neighbors from within the sample. Queries are processed by first locating approximate neighbors within the sample, and then using the pre-established connections to discover neighbors within the remainder of the data set. The SASH index relies on a pairwise distance measure, but otherwise makes no assumptions regarding the representation of the data. Experimental results are provided for query-by-example operations on protein sequence, image, and text data sets, including one consisting of more than 1 million vectors spanning more than 1.1 million terms 驴 far in excess of what spatial search indices can handle efficiently. For sets of this size, the SASH can return a large proportion of the true neighbors roughly 2 orders of magnitude faster than sequential search.