Information Fusion in Multimedia Information Retrieval
Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Can feature information interaction help for information fusion in multimedia problems?
Multimedia Tools and Applications
Foundations and Trends in Information Retrieval
Content based image retrieval using unclean positive examples
IEEE Transactions on Image Processing
IEEE Transactions on Image Processing
A continuum between browsing and query-based search for user-centered multimedia information access
AMR'09 Proceedings of the 7th international conference on Adaptive multimedia retrieval: understanding media and adapting to the user
Semantic combination of textual and visual information in multimedia retrieval
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Multimedia information retrieval in a social context
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Narrative theme navigation for sitcoms supported by fan-generated scripts
Multimedia Tools and Applications
Learning-Based interactive retrieval in large-scale multimedia collections
AMR'11 Proceedings of the 9th international conference on Adaptive Multimedia Retrieval: large-scale multimedia retrieval and evaluation
Hi-index | 0.14 |
This paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of \emph{dissimilarity space} that is adapted to the asymmetric classification problem, and in turn to the \emph{query-by-example} and \emph{relevance feedback} paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (\emph{ie} TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a real-time framework.