Natural language processing versus content-based image analysis for medical document retrieval

Authors:
Aurélie Névéol;Thomas M. Deserno;Stéfan J. Darmoni;Mark Oliver Güld;Alan R. Aronson
Affiliations:
U.S. National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894;U.S. National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894 and Department of Medical Informatics, Aachen University of Technology (RWTH), Pauwelsstra ...;CISMeF Group, Rouen University Hospital and GCSIS, LITIS EA 4051, Institute of BioMedical Research, University of Rouen, 1 rue de Germont, 76031 Rouen Cedex, France;Department of Medical Informatics, Aachen University of Technology (RWTH), Pauwelsstrasse 30, 52057, Aachen, Germany;U.S. National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894
Venue:
Journal of the American Society for Information Science and Technology
Year:
2009

Citing 0
Cited 6

Document retrieval using image features

Proceedings of the 2010 ACM Symposium on Applied Computing
Challenges of medical image processing

Computer Science - Research and Development
Towards a framework for abstractive summarization of multimodal documents

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Abstractive summarization of line graphs from popular media

WASDGML '11 Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
MedFMI-SiR: a powerful DBMS solution for large-scale medical image retrieval

ITBAM'11 Proceedings of the Second international conference on Information technology in bio- and medical informatics
Integrating wavelets with clustering and indexing for effective content-based image retrieval

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most significant recent advances in health information systems has been the shift from paper to electronic documents. While research on automatic text and image processing has taken separate paths, there is a growing need for joint efforts, particularly for electronic health records and biomedical literature databases. This work aims at comparing text-based versus image-based access to multimodal medical documents using state-of-the-art methods of processing text and image components. A collection of 180 medical documents containing an image accompanied by a short text describing it was divided into training and test sets. Content-based image analysis and natural language processing techniques are applied individually and combined for multimodal document analysis. The evaluation consists of an indexing task and a retrieval task based on the “gold standard” codes manually assigned to corpus documents. The performance of text-based and image-based access, as well as combined document features, is compared. Image analysis proves more adequate for both the indexing and retrieval of the images. In the indexing task, multimodal analysis outperforms both independent image and text analysis. This experiment shows that text describing images can be usefully analyzed in the framework of a hybrid text-image retrieval system. © 2009 Wiley Periodicals, Inc.