Learning embeddings for indexing, retrieval, and classification, with applications to object and shape recognition in image databases

  • Authors:
  • Stan Sclaroff;Vassilis Athitsos

  • Affiliations:
  • Boston University;Boston University

  • Venue:
  • Learning embeddings for indexing, retrieval, and classification, with applications to object and shape recognition in image databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. This thesis proposes new methods for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The first contribution of this thesis is the BoostMap algorithm for embedding arbitrary spaces into a vector space with a computationally efficient distance measure. Using this approach, an approximate set of nearest neighbors can be retrieved efficiently---often orders of magnitude faster than retrieval using the exact distance measure in the original space. In BoostMap, embedding construction is treated as a machine learning problem. The learning-based formulation leads to an algorithm that directly maximizes the amount of nearest neighbor structure preserved by the embedding, without making any assumptions about the underlying geometry of the original space. The second contribution consists of extending BoostMap to produce, together with the embedding, a query-sensitive distance measure for the target space of the embedding. In high-dimensional spaces, query-sensitive distance measures allow for automatic selection of the dimensions that are the most informative for each specific query object. The third contribution is a method for speeding up nearest neighbor classification by combining multiple embedding-based nearest neighbor classifiers in a cascade structure. In cascade-based classification, computationally efficient classifiers are used to quickly classify easy cases, and classifiers that are more computationally expensive and also more accurate are only applied to objects that are harder to classify. The proposed methods are evaluated experimentally in several different applications: hand shape recognition, off-line character recognition, online character recognition, and efficient retrieval of time series. In all datasets, the proposed methods lead to significant improvements in accuracy and efficiency compared to existing state-of-the-art methods.