Efficient similarity search and classification via rank aggregation

Authors:
Ronald Fagin;Ravi Kumar;D. Sivakumar
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Year:
2003

Citing 15
Cited 96

Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Hierarchical subspace sampling: a unified framework for high dimensional data reduction, selectivity estimation and nearest neighbor search

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Combining fuzzy information: an overview

ACM SIGMOD Record
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces

SIAM Journal on Computing
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Exact Analysis of Dodgson Elections: Lewis Carroll's 1876 Voting System is Complete for Parallel Access to NP

ICALP '97 Proceedings of the 24th International Colloquium on Automata, Languages and Programming
Joining ranked inputs in practice

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Mining anchor text for query refinement

Proceedings of the 13th international conference on World Wide Web
On accessing data in high-dimensional spaces: a comparative study of three space partitioning strategies

Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Guiding queries to information sources with InfoBeacons

Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Comparing and aggregating rankings with ties

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Aggregating inconsistent information: ranking and clustering

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Formulating distance functions via the kernel trick

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Formulating context-dependent similarity functions

Proceedings of the 13th annual ACM international conference on Multimedia
Automatic complex schema matching across Web query interfaces: A correlation mining approach

ACM Transactions on Database Systems (TODS)
Searching with context

Proceedings of the 15th international conference on World Wide Web
Context-sensitive ranking

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Algorithms for discovering bucket orders from data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing progressive query-by-example over pre-clustered large image databases

Proceedings of the 2nd international workshop on Computer vision meets databases
A case-study of scoring schemes for the PvS-index

Proceedings of the 2nd international workshop on Computer vision meets databases
Similarity search: a matching based approach

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Adaptive image retrieval using a Graph model for semantic feature integration

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Blazingly fast image copyright enforcement

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Scalability of local image descriptors: a comparative study

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Dichotomy for voting systems

Journal of Computer and System Sciences
Anyone but him: The complexity of precluding an alternative

Artificial Intelligence
Flexible integration of multimedia sub-queries with qualitative preferences

Multimedia Tools and Applications
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Supervised rank aggregation

Proceedings of the 16th international conference on World Wide Web
Rank Aggregation for Automatic Schema Matching

IEEE Transactions on Knowledge and Data Engineering
Finding near neighbors through cluster pruning

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
Ranking with multiple hyperplanes

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Aggregation of partial rankings, p-ratings and top-m lists

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Efficiency-quality tradeoffs for vector score aggregation

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Merging the results of approximate match operations

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Disorder inequality: a combinatorial approach to nearest neighbor search

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Efficient similarity joins for near duplicate detection

Proceedings of the 17th international conference on World Wide Web
Discovering bucket orders from full rankings

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Corrigendum to "efficient similarity search and classification via rank aggregation" by Ronald Fagin, Ravi Kumar and D. Sivakumar (proc. SIGMOD'03)

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Semantic representation of multimedia content: Knowledge representation and semantic indexing

Multimedia Tools and Applications
Fast identification of visual documents using local descriptors

Proceedings of the eighth ACM symposium on Document engineering
Aggregating inconsistent information: Ranking and clustering

Journal of the ACM (JACM)
Sincere-Strategy Preference-Based Approval Voting Broadly Resists Control

MFCS '08 Proceedings of the 33rd international symposium on Mathematical Foundations of Computer Science
High-dimensional descriptor indexing for large multimedia databases

Proceedings of the 17th ACM conference on Information and knowledge management
Dynamic user-defined similarity searching in semi-structured text retrieval

Proceedings of the 3rd international conference on Scalable information systems
Finding Total and Partial Orders from Data for Seriation

DS '08 Proceedings of the 11th International Conference on Discovery Science
Fast Content-Based Mining of Web2.0 Videos

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Rank Aggregation to Combine QoS in Web Search

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Web searching for daily living

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Anyone but him: the complexity of precluding an alternative

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 1
Hybrid elections broaden complexity-theoretic resistance to control

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Fast Matching for All Pairs Similarity Search

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Combinatorial Framework for Similarity Search

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Advanced Techniques in CBIR: Local Descriptors, Visual Dictionaries and Bags of Features

SIBGRAPI-TUTORIALS '09 Proceedings of the 2009 Tutorials of the XXII Brazilian Symposium on Computer Graphics and Image Processing
An approach to group ranking decisions in a dynamic environment

Decision Support Systems
Pictures from Mongolia: partial sorting in a partial world

FUN'07 Proceedings of the 4th international conference on Fun with algorithms
Generalized distances between rankings

Proceedings of the 19th international conference on World wide web
Parameterized complexity and approximability of the SLCS problem

IWPEC'08 Proceedings of the 3rd international conference on Parameterized and exact computation
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
Discovering significant relaxed order-preserving submatrices

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Control complexity in fallback voting

CATS '10 Proceedings of the Sixteenth Symposium on Computing: the Australasian Theory - Volume 109
Nearest neighbor search: algorithmic perspective

SIGSPATIAL Special
Score aggregation techniques in retrieval experimentation

ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Group ranking with application to image retrieval

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
iPoc: a polar coordinate based indexing method for nearest neighbor search in high dimensional space

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Effective rank aggregation for metasearching

Journal of Systems and Software
Scaling up top-K cosine similarity search

Data & Knowledge Engineering
Multimodal social intelligence in a real-time dashboard system

The VLDB Journal — The International Journal on Very Large Data Bases
Rank-mixer and rank-booster: improving the effectiveness of retrieval methods

ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Concordance and consensus

Information Sciences: an International Journal
Supporting early pruning in top-k query processing on massive data

Information Processing Letters
Flexible aggregate similarity search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
NV-Tree: nearest neighbors at the billion scale

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Efficient similarity joins for near-duplicate detection

ACM Transactions on Database Systems (TODS)
Efficient approximate similarity search using random projection learning

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Adaptive parallel approximate similarity search for responsive multimedia retrieval

Proceedings of the 20th ACM international conference on Information and knowledge management
Context-aware web search in ubiquitous sensor environments

ACM Transactions on Internet Technology (TOIT)
Estimating recall and precision for vague queries in databases

CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
Nearest neighbor search on vertically partitioned high-dimensional data

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
High-dimensional similarity search using data-sensitive space partitioning

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
A flexible generative model for preference aggregation

Proceedings of the 21st international conference on World Wide Web
Supervised rank aggregation approach for link prediction in complex networks

Proceedings of the 21st international conference companion on World Wide Web
Locality-sensitive hashing scheme based on dynamic collision counting

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Combining summaries using unsupervised rank aggregation

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
CRSI: a compact randomized similarity index for set-valued features

Proceedings of the 15th International Conference on Extending Database Technology
Parameterized complexity and approximability of the Longest Compatible Sequence problem

Discrete Optimization
Conversation retrieval for microblogging sites

Information Retrieval
Studies in computational aspects of voting: open problems of downey and fellows

The Multivariate Algorithmic Revolution and Beyond
An approach to reshaping clusters for nearest neighbor search

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Synthesis ranking with critic resonance

Proceedings of the 3rd Annual ACM Web Science Conference
Query specific fusion for image retrieval

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Detecting near-duplicate documents using sentence-level features and supervised learning

Expert Systems with Applications: An International Journal
Mining consensus preference graphs from users' ranking data

Decision Support Systems
A mediator-based approach for integrating heterogeneous multimedia sources

Multimedia Tools and Applications
The complexity of losing voters

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Understanding Similarity Metrics in Neighbour-based Recommender Systems

Proceedings of the 2013 Conference on the Theory of Information Retrieval
CRF framework for supervised preference aggregation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Recommendations of closed consensus temporal patterns by group decision making

Knowledge-Based Systems
Dimension independent similarity computation

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel approach to performing efficient similarity search and classification in high dimensional data. In this framework, the database elements are vectors in a Euclidean space. Given a query vector in the same space, the goal is to find elements of the database that are similar to the query. In our approach, a small number of independent "voters" rank the database elements based on similarity to the query. These rankings are then combined by a highly efficient aggregation algorithm. Our methodology leads both to techniques for computing approximate nearest neighbors and to a conceptually rich alternative to nearest neighbors.One instantiation of our methodology is as follows. Each voter projects all the vectors (database elements and the query) on a random line (different for each voter), and ranks the database elements based on the proximity of the projections to the projection of the query. The aggregation rule picks the database element that has the best median rank. This combination has several appealing features. On the theoretical side, we prove that with high probability, it produces a result that is a (1 + ε) factor approximation to the Euclidean nearest neighbor. On the practical side, it turns out to be extremely efficient, often exploring no more than 5% of the data to obtain very high-quality results. This method is also database-friendly, in that it accesses data primarily in a pre-defined order without random accesses, and, unlike other methods for approximate nearest neighbors, requires almost no extra storage. Also, we extend our approach to deal with the k nearest neighbors.We conduct two sets of experiments to evaluate the efficacy of our methods. Our experiments include two scenarios where nearest neighbors are typically employed---similarity search and classification problems. In both cases, we study the performance of our methods with respect to several evaluation criteria, and conclude that they are uniformly excellent, both in terms of quality of results and in terms of efficiency.