A figure search engine architecture for a chemistry digital library

  • Authors:
  • Sagnik Ray Choudhury;Suppawong Tuarob;Prasenjit Mitra;Lior Rokach;Andi Kirk;Silvia Szep;Donald Pellegrino;Sue Jones;Clyde Lee Giles

  • Affiliations:
  • The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, PA, USA;Ben-Gurion University of the Negev, Beer Sheva, Israel;The Dow Chemical Company, Spring House, PA, USA;The Dow Chemical Company, Spring House, PA, USA;The Dow Chemical Company, Spring House, PA, USA;The Dow Chemical Company, Spring House, PA, USA;The Pennsylvania State University, University Park, PA, USA

  • Venue:
  • Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Academic papers contain multiple figures representing important findings and experimental results; we present a search engine specifically focused on figures in academic documents. This search engine allows users to search on figures in approximately 150,000 chemistry journal articles though the method is easily extendable to other domains. Our system indexes figure caption and mentions extracted from the PDF in documents using a custom built extractor. Recall and precision performance of extracted figures is in the 80 to 90% range. We give the frame work for the extraction algorithm, architecture and ranking function.