Using non-lexical features to identify effective indexing terms for biomedical illustrations

Authors:
Matthew Simpson;Dina Demner-Fushman;Charles Sneiderman;Sameer K. Antani;George R. Thoma
Affiliations:
National Library of Medicine, NIH, Bethesda, MD;National Library of Medicine, NIH, Bethesda, MD;National Library of Medicine, NIH, Bethesda, MD;National Library of Medicine, NIH, Bethesda, MD;National Library of Medicine, NIH, Bethesda, MD
Venue:
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2009

Citing 8
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Support-Vector Networks

Machine Learning
Not So Naive Bayes: Aggregating One-Dependence Estimators

Machine Learning
The form is the substance: classification of genres in text

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
Towards practical genre classification of web documents

Proceedings of the 15th international conference on World Wide Web
Learning to classify documents according to genre: Special Topic Section on Computational Analysis of Style

Journal of the American Society for Information Science and Technology
Automatically Finding Images for Clinical Decision Support

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
FastSum: fast and accurate query-based multi-document summarization

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic image annotation is an attractive approach for enabling convenient access to images found in a variety of documents. Since image captions and relevant discussions found in the text can be useful for summarizing the content of images, it is also possible that this text can be used to generate salient indexing terms. Unfortunately, this problem is generally domain-specific because indexing terms that are useful in one domain can be ineffective in others. Thus, we present a supervised machine learning approach to image annotation utilizing non-lexical features extracted from image-related text to select useful terms. We apply this approach to several subdomains of the biomedical sciences and show that we are able to reduce the number of ineffective indexing terms.