Probabilistic models for topic learning from images and captions in online biomedical literatures

Authors:
Xin Chen;Caimei Lu;Yuan An;Palakorn Achananuparp
Affiliations:
College of Information Science & Technology, Drexel University, Philadelphia, PA, USA;College of Information Science & Technology, Drexel University, Philadelphia, PA, USA;College of Information Science & Technology, Drexel University, Philadelphia, PA, USA;College of Information Science & Technology, Drexel University, Philadelphia, PA, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 16
Cited 5

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Saliency, Scale and Image Description

International Journal of Computer Vision
Information Retrieval

Information Retrieval
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Understanding captions in biomedical publications

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Statistical entity-topic models

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Towards optimal bag-of-features for object categorization and semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Annotating images and image objects using a hierarchical dirichlet process model

Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008
MaxMatcher: biological concept extraction using approximate dictionary lookup

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence

The topic-perspective model for social tagging systems

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic topic-connection model for automatic image annotation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Perspective hierarchical dirichlet process for user-tagged image modeling

Proceedings of the 20th ACM international conference on Information and knowledge management
Towards noise-resilient document modeling

Proceedings of the 20th ACM international conference on Information and knowledge management
On handling textual errors in latent document modeling

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biomedical images and captions are one of the major sources of information in online biomedical publications. They often contain the most important results to be reported, and provide rich information about the main themes in published papers. In the data mining and information retrieval community, there has been much effort on using text mining and language modeling algorithms to extract knowledge from the text content of online biomedical publications; however, the problem of knowledge extraction from biomedical images and captions has not been fully studied yet. In this paper, a hierarchical probabilistic topic model with background distribution (HPB) is introduced to uncover the latent semantic topics from the co-occurrence patterns of caption words, visual words and biomedical concepts. With downloaded biomedical figures, restricted captions are extracted with regard to each individual image panel. During the indexing stage, the 'bag-of-words' representation of captions is supplemented by an ontology-based concept indexing to alleviate the synonym and polysemy problems. As the visual counterpart of text words, the visual words are extracted and indexed from corresponding image panels. The model is estimated via collapsed Gibbs sampling algorithm. We compare the performance of our model with the extension of the Correspondence LDA (Corr-LDA) model under the same biomedical image annotation scenario using cross-validation. Experimental results demonstrate that our model is able to accurately extract latent patterns from complicated biomedical image-caption pairs and facilitate knowledge organization and understanding in online biomedical literatures.