Structured literature image finder: extracting information from text and images in biomedical literature

Authors:
Luís Pedro Coelho;Amr Ahmed;Andrew Arnold;Joshua Kangas;Abdul-Saboor Sheikh;Eric P. Xing;William W. Cohen;Robert F. Murphy
Affiliations:
Lane Center for Computational Biology, Carnegie Mellon University;Machine Learning Department, Carnegie Mellon University;Machine Learning Department, Carnegie Mellon University;Lane Center for Computational Biology, Carnegie Mellon University;Center for Bioimage Informatics, Carnegie Mellon University;Lane Center for Computational Biology, Carnegie Mellon University;Lane Center for Computational Biology, Carnegie Mellon University;Lane Center for Computational Biology, Carnegie Mellon University
Venue:
ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
Year:
2009

Citing 11
Cited 0

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Searching Online Journals for Fluorescence Microscope Images Depicting Protein Subcellular Location Patterns

BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
Genre-Based Search through Biomedical Images

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 1 - Volume 1
Understanding captions in biomedical publications

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
High-recall protein entity recognition using a dictionary

Bioinformatics
Integrating image data into biomedical text categorization

Bioinformatics
Structured correspondence topic models for mining captioned figures in biological literature

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploring text and image features to classify images in bioscience literature

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Invited paper: Structured literature image finder: Parsing text and figures in biomedical literature

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif provides a publicly available searchable database (http://slif.cbi.cmu.edu). Slif originally focused on fluorescence microscopy images. We have now extended it to classify panels into more image types. We also improved the classification into subcellular classes by building a more representative training set. To get the most out of the human labeling effort, we used active learning to select images to label. We developed models that take into account the structure of the document (with panels inside figures inside papers) and the multi-modality of the information (free and annotated text, images, information from external databases). This has allowed us to provide new ways to navigate a large collection of documents.