Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Discriminative training and maximum entropy models for statistical machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Evaluating Color Descriptors for Object and Scene Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Every picture tells a story: generating sentences from images
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Understanding images with natural sentences
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Efficient image annotation for automatic sentence generation
Proceedings of the 20th ACM international conference on Multimedia
What is happening: annotating images with verbs
Proceedings of the 20th ACM international conference on Multimedia
Hi-index | 0.00 |
For the overwhelming amounts of multimedia used on the Web, methods of search and understanding with sentences are necessary. Representing the contents not only using labels but also using sentences including labels' relations enables users to search with a story and to understand multimedia deeply. However, few existing works describe such sentences because obtaining objects' relations and grammar is difficult. We specifically examine captions of images that are similar to an input image. They are expected to explain the input image to some degree. Therefore, we propose a novel approach to generate a sentential caption for the input image by summarizing those captions. Our experiment using a dataset consisting of images and text demonstrates that the proposed method can generate sentential captions.