GroupLens: applying collaborative filtering to Usenet news
Communications of the ACM
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Lower bounds for high dimensional nearest neighbor search and related problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Measuring index quality using random walks on the Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Approximate nearest neighbor queries in fixed dimensions
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
The small-world phenomenon: an algorithmic perspective
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Closest pair queries in spatial databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Finding nearest neighbors in growth-restricted metrics
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems (TODS)
Amazon.com Recommendations: Item-to-Item Collaborative Filtering
IEEE Internet Computing
Searching in Metric Spaces by Spatial Approximation
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Efficient similarity search and classification via rank aggregation
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
A note on the nearest neighbor in growth-restricted metrics
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Navigating nets: simple algorithms for proximity search
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Improved robustness of signature-based near-replica detection via lexicon randomization
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
UnitWalk: A New SAT Solver that Uses Local Search Guided by Unit Clause Elimination
Annals of Mathematics and Artificial Intelligence
Distance estimation and object location via rings of neighbors
Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Meridian: a lightweight network location service without virtual coordinates
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Fast Construction of Nets in Low-Dimensional Metrics and Their Applications
SIAM Journal on Computing
Searching dynamic point sets in spaces with bounded doubling dimension
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
The black-box complexity of nearest-neighbor search
Theoretical Computer Science - Automata, languages and programming: Algorithms and complexity (ICALP-A 2004)
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Proceedings of the 15th international conference on World Wide Web
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Routing in Networks with Low Doubling Dimension
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Do not crawl in the dust: different urls with similar text
Proceedings of the 16th international conference on World Wide Web
Local embeddings of metric spaces
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Learning random walks to rank nodes in graphs
Proceedings of the 24th international conference on Machine learning
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A doubling dimension threshold θ(loglogn) for augmented graph navigability
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A divide and conquer algorithm for d-dimensional arrangement
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Disorder inequality: a combinatorial approach to nearest neighbor search
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Embedding metric spaces in their intrinsic dimension
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
A discriminative framework for clustering via similarity functions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Introduction to Information Retrieval
Introduction to Information Retrieval
Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
MESSIF: metric similarity search implementation framework
DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
Estimation of the click volume by large scale regression analysis
CSR'07 Proceedings of the Second international conference on Computer Science: theory and applications
Maximal intersection queries in randomized graph models
CSR'07 Proceedings of the Second international conference on Computer Science: theory and applications
SIGSPATIAL Special
Nearest neighbor search: algorithmic perspective
SIGSPATIAL Special
On nonmetric similarity search problems in complex domains
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We present an overview of combinatorial framework for similarity search. An algorithm is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Namely, the input dataset is represented by a comparison oracle that given any three points X,Y,Z answers whether Y or Z is closer to X. We assume that the similarity order of the dataset satisfies the four variations of the following disorder inequality: if X is the A'th most similar object to Y and Y is the B'th most similar object to Z, then X is among the D(A+B) most similar objects to Z, where D is a relatively small disorder constant. Combinatorial algorithms for nearest neighbor search have two important advantages: (1) they do not map similarity values to artificial distance values and do not use triangle inequality for the latter, and (2) they work for arbitrarily complicated data representations and similarity functions. Ranwalk, the first known combinatorial solution for nearest neighbors, is randomized, exact, zero-error algorithm with query time that is logarithmic in number of objects. But Ranwalk preprocessing time is quadratic. Later on, another solution, called combinatorial nets, was discovered. It is deterministic and exact algorithm with almost linear time and space complexity of preprocessing, and near-logarithmic time complexity of search. Combinatorial nets also have a number of side applications. For near-duplicate detection they lead to the first known deterministic algorithm that requires just near-linear time + time proportional to the size of output. For any dataset with small disorder combinatorial nets can be used to construct a visibility graph: the one in which greedy routing deterministically converges to the nearest neighbor of a target in logarithmic number of steps. The later result is the first known work-around for Navarro's impossibility of generalizing Delaunay graphs.