Learning to summarize web image and text mutually

Authors:
Piji Li;Jun Ma;Shuai Gao
Affiliations:
Shandong University, Jinan, China;Shandong University, Jinan, China;Shandong University, Jinan, China
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 17
Cited 0

Modern Information Retrieval

Modern Information Retrieval
Probabilistic multimedia retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Matching words and pictures

The Journal of Machine Learning Research
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Supervised Learning of Semantic Classes for Image Annotation and Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Latent semantic fusion model for image retrieval and annotation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Combining image captions and visual analysis for image concept classification

Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008
OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning

International Journal of Computer Vision
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
A new approach to cross-modal multimedia retrieval

Proceedings of the international conference on Multimedia
Every picture tells a story: generating sentences from images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of learning to summarize images by text and visualize text utilizing images, which we call Mutual-Summarization. We divide the web image-text data space into three subspaces, namely pure image space (PIS), pure text space (PTS) and image-text joint space (ITJS). Naturally, we treat the ITJS as a knowledge base. For summarizing images by sentence issue, we map images from PIS to ITJS via image classification models and use text summarization on the corresponding texts in ITJS to summarize images. For text visualization problem, we map texts from PTS to ITJS via text categorization models and generate the visualization by choosing the semantic related images from ITJS, where the selected images are ranked by their confidence. In above approaches images are represented by color histograms, dense visual words and feature descriptors at different levels of spatial pyramid; and the texts are generated according to the Latent Dirichlet Allocation (LDA) topic model. Multiple Kernel (MK) methodologies are used to learn classifiers for image and text respectively. We show the Mutual-Summarization results on our newly collected dataset of six big events ("Gulf Oil Spill", "Haiti Earthquake", etc.) as well as demonstrate improved cross-media retrieval performance over existing methods in terms of MAP, Precision and Recall.