Understanding captions in biomedical publications

Authors:
William W. Cohen;Richard Wang;Robert F. Murphy
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 6
Cited 8

A simple, fast, and effective rule learner

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Boosted Wrapper Induction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Searching Online Journals for Fluorescence Microscope Images Depicting Protein Subcellular Location Patterns

BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering

Intra-document structural frequency features for semi-supervised domain adaptation

Proceedings of the 17th ACM conference on Information and knowledge management
Structured correspondence topic models for mining captioned figures in biological literature

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploring the efficacy of caption search for bioscience journal search interfaces

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Towards automatic image region annotation: image region textual coreference resolution

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Probabilistic models for topic learning from images and captions in online biomedical literatures

Proceedings of the 18th ACM conference on Information and knowledge management
Invited paper: Structured literature image finder: Parsing text and figures in biomedical literature

Web Semantics: Science, Services and Agents on the World Wide Web
Finding captions in PDF-Documents for semantic annotations of images

SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Structured literature image finder: extracting information from text and images in biomedical literature

ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

From the standpoint of the automated extraction of scientific knowledge, an important but little-studied part of scientific publications are the figures and accompanying captions. Captions are dense in information, but also contain many extra-grammatical constructs, making them awkward to process with standard information extraction methods. We propose a scheme for "understanding" captions in biomedical publications by extracting and classifying "image pointers" (references to the accompanying image). We evaluate a number of automated methods for this task, including hand-coded methods, methods based on existing learning techniques, and methods based on novel learning techniques. The best of these methods leads to a usefully accurate tool for caption-understanding, with both recall and precision in excess of 94% on the most important single class in a combined extraction/classification task.