Invited paper: Structured literature image finder: Parsing text and figures in biomedical literature

  • Authors:
  • Amr Ahmed;Andrew Arnold;Luis Pedro Coelho;Joshua Kangas;Abdul-Saboor Sheikh;Eric Xing;William Cohen;Robert F. Murphy

  • Affiliations:
  • Machine Learning Department, Carnegie Mellon University, United States and Language Technologies Institute, Carnegie Mellon University, United States;Machine Learning Department, Carnegie Mellon University, United States;Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology, United States and Center for Bioimage Informatics, Carnegie Mellon University, United States and L ...;Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology, United States and Center for Bioimage Informatics, Carnegie Mellon University, United States and L ...;Center for Bioimage Informatics, Carnegie Mellon University, United States;Machine Learning Department, Carnegie Mellon University, United States and Language Technologies Institute, Carnegie Mellon University, United States and Joint Carnegie Mellon University-Universit ...;Machine Learning Department, Carnegie Mellon University, United States and Language Technologies Institute, Carnegie Mellon University, United States and Joint Carnegie Mellon University-Universit ...;Machine Learning Department, Carnegie Mellon University, United States and Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology, United States and Cente ...

  • Venue:
  • Web Semantics: Science, Services and Agents on the World Wide Web
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The SLIF project combines text-mining and image processing to extract structured information from biomedical literature. SLIF extracts images and their captions from published papers. The captions are automatically parsed for relevant biological entities (protein and cell type names), while the images are classified according to their type (e.g., micrograph or gel). Fluorescence microscopy images are further processed and classified according to the depicted subcellular localization. The results of this process can be queried online using either a user-friendly web-interface or an XML-based web-service. As an alternative to the targeted query paradigm, SLIF also supports browsing the collection based on latent topic models which are derived from both the annotated text and the image data. The SLIF web application, as well as labeled datasets used for training system components, is publicly available at http://slif.cbi.cmu.edu.