Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
VideoQA: question answering on news video
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Question answering on lecture videos: a multifaceted approach
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Content-based multimedia information retrieval: State of the art and challenges
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Sentence Similarity Based on Semantic Nets and Corpus Statistics
IEEE Transactions on Knowledge and Data Engineering
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Practical elimination of near-duplicates from web video search
Proceedings of the 15th international conference on Multimedia
Query suggestions for mobile search: understanding usage patterns
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Annotating Images by Mining Image Search Results
IEEE Transactions on Pattern Analysis and Machine Intelligence
Photo-based question answering
MM '08 Proceedings of the 16th ACM international conference on Multimedia
The MIR flickr retrieval evaluation
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
A syntactic tree matching approach to finding similar questions in community-based qa services
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Mobile media search: has media search finally found its perfect platform? part II
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Improving Bag-of-Features for Large Scale Image Search
International Journal of Computer Vision
Spatial coding for large scale partial-duplicate web image search
Proceedings of the international conference on Multimedia
Question Answering over Community-Contributed Web Videos
IEEE MultiMedia
Scalable triangulation-based logo recognition
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Multimedia answering: enriching text QA with media information
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning cooking techniques from youtube
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
On the Annotation of Web Videos by Efficient Near-Duplicate Search
IEEE Transactions on Multimedia
A Robust Passage Retrieval Algorithm for Video Question Answering
IEEE Transactions on Circuits and Systems for Video Technology
Searching visual instances with topology checking and context modeling
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
VIREO-VH: libraries and tools for threading and visualizing a large video collection
ACM SIGMultimedia Records
Model-based sparse component analysis for multiparty distant speech recognition: Afsaneh Asaei
ACM SIGMultimedia Records
Hi-index | 0.00 |
In real-life, it is easier to provide a visual cue when asking a question about a possibly unfamiliar topic, for example, asking the question, "Where was this crop circle found?". Providing an image of the instance is far more convenient than texting a verbose description of the visual properties, especially when the name of the query instance is not known. Nevertheless, having to identify the visual instance before processing the question and eventually returning the answer makes multimodal question-answering technically challenging. This paper addresses the problem of visual-to-text naming through the paradigm of answering-by-search in a two-stage computational framework, which is composed out of instance search (IS) and similar question ranking (QR). In IS, names of the instances are inferred from similar visual examples searched through a million-scale image dataset. For recalling instances of non-planar and non-rigid shapes, spatial configurations that emphasize topology consistency while allowing for local variations in matches have been incorporated. In QR, the candidate names of the instance are statistically identified from search results and directly utilized to retrieve similar questions from community-contributed QA (cQA) archives. By parsing questions into syntactic trees, a fuzzy matching between the inquirer's question and cQA questions is performed to locate answers and recommend related questions to the inquirer. The proposed framework is evaluated on a wide range of visual instances (e.g., fashion, art, food, pet, logo, and landmark) over various QA categories (e.g., factoid, definition, how-to, and opinion).