Large-scale document image retrieval and classification with runlength histograms and binary embeddings

  • Authors:
  • Albert Gordo;Florent Perronnin;Ernest Valveny

  • Affiliations:
  • Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain;Xerox Research Centre Europe (XRCE), Grenoble, France;Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain

  • Venue:
  • Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present a new document image descriptor based on multi-scale runlength histograms. This descriptor does not rely on layout analysis and can be computed efficiently. We show how this descriptor can achieve state-of-the-art results on two very different public datasets in classification and retrieval tasks. Moreover, we show how we can compress and binarize these descriptors to make them suitable for large-scale applications. We can achieve state-of-the-art results in classification using binary descriptors of as few as 16-64 bits.