Large scalability in document image matching using text retrieval

  • Authors:
  • Jorge Moraleda

  • Affiliations:
  • Ricoh Innovations Inc., 2882 Sand Hill Road Suite 115, Menlo Park, CA 94025, USA

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2012

Quantified Score

Hi-index 0.10

Visualization

Abstract

We present a method that addresses image matching from partial blurry images by casting it as a problem of text retrieval. This allows us to leverage existing text document retrieval techniques and achieve efficiency and scalability similar to text search applications. As an initial application, we present a document image matching system in which the user supplies a query image of a small patch of a paper document taken with a cell phone camera, and the system returns a label identifying the original electronic document if found in a previously indexed collection. We have implemented our method in a client server architecture. Feature computation on a mobile client is done in under 100ms, while end-to-end document recognition on a collection of more than 4300 pages requires approximately 500ms per image. Approximately 170ms is connection time and thus subject to network speed variations. We conclude presenting scalability results on a collection of nearly 500,000 documents.