Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Approximate nearest neighbor queries in fixed dimensions
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
The small-world phenomenon: an algorithmic perspective
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Closest pair queries in spatial databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Finding nearest neighbors in growth-restricted metrics
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems (TODS)
Searching in metric spaces by spatial approximation
The VLDB Journal — The International Journal on Very Large Data Bases
A note on the nearest neighbor in growth-restricted metrics
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Navigating nets: simple algorithms for proximity search
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Improved robustness of signature-based near-replica detection via lexicon randomization
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Distance estimation and object location via rings of neighbors
Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing
Meridian: a lightweight network location service without virtual coordinates
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Fast Construction of Nets in Low-Dimensional Metrics and Their Applications
SIAM Journal on Computing
Searching dynamic point sets in spaces with bounded doubling dimension
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
The black-box complexity of nearest-neighbor search
Theoretical Computer Science - Automata, languages and programming: Algorithms and complexity (ICALP-A 2004)
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
Routing in Networks with Low Doubling Dimension
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Do not crawl in the dust: different urls with similar text
Proceedings of the 16th international conference on World Wide Web
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A doubling dimension threshold θ(loglogn) for augmented graph navigability
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A divide and conquer algorithm for d-dimensional arrangement
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Disorder inequality: a combinatorial approach to nearest neighbor search
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A discriminative framework for clustering via similarity functions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Estimation of the click volume by large scale regression analysis
CSR'07 Proceedings of the Second international conference on Computer Science: theory and applications
Combinatorial Framework for Similarity Search
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Nearest neighbor search: algorithmic perspective
SIGSPATIAL Special
Content search through comparisons
ICALP'11 Proceedings of the 38th international conference on Automata, languages and programming - Volume Part II
Fast approximate nearest-neighbor search with k-nearest neighbor graph
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Hi-index | 0.00 |
We study the so called combinatorial framework for algorithmic problems in similarity spaces. Namely, the input dataset is represented by a comparison oracle that given three points x, y, y' answers whether y or y' is closer to x. We assume that the similarity order of the dataset satisfies the four variations of the following disorder inequality: if x is the a'th most similar object to y and y is the b'th most similar object to z, then x is among the D(a + b) most similar objects to z, where D is a relatively small disorder constant. Though the oracle gives much less information compared to the standard general metric space model where distance values are given, one can still design very efficient algorithms for various fundamental computational tasks. For nearest neighbor search we present deterministic and exact algorithm with almost linear time and space complexity of preprocessing, and near-logarithmic time complexity of search. Then, for near-duplicate detection we present the first known deterministic algorithm that requires just near-linear time + time proportional to the size of output. Finally, we show that for any dataset satisfying the disorder inequality a visibility graph can be constructed: all outdegrees are near-logarithmic and greedy routing deterministically converges to the nearest neighbor of a target in logarithmic number of steps. The later result is the first known work-around for Navarro's impossibility of generalizing Delaunay graphs. The technical contribution of the paper consists of handling "false positives" in data structures and an algorithmic technique up-aside-down-filter.