Fisher Linear Discriminant Analysis for text-image combination in multimedia information retrieval

Authors:
Christophe Moulin;Christine Largeron;Christophe Ducottet;Mathias Géry;Cécile Barat
Affiliations:
-;-;-;-;-
Venue:
Pattern Recognition
Year:
2014

Citing 38
Cited 0

Automatic combination of multiple ranked retrieval systems

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
A vector space model for automatic indexing

Communications of the ACM
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Condorcet fusion for improved retrieval

Proceedings of the eleventh international conference on Information and knowledge management
Data fusion with estimated weights

Proceedings of the eleventh international conference on Information and knowledge management
Fusion Via a Linear Combination of Scores

Information Retrieval
Query by Image and Video Content: The QBIC System

Computer
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Information fusion in biometrics

Pattern Recognition Letters - Special issue: Audio- and video-based biometric person authentication (AVBPA 2001)
Matching words and pictures

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Automatic ranking of information retrieval systems using data fusion

Information Processing and Management: an International Journal
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Audio-visual synchrony for detection of monologues in video archives

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
A survey of content-based image retrieval with high-level semantics

Pattern Recognition
Improvement of Image Retrieval by Fusing Different Descriptors

WIAMIS '07 Proceedings of the Eight International Workshop on Image Analysis for Multimedia Interactive Services
Web image retrieval on ImagEVAL: evidences on visualness and textualness concept dependency in fusion model

Proceedings of the 6th ACM international conference on Image and video retrieval
Audio-visual speech recognition using MPEG-4 compliant visual features

EURASIP Journal on Applied Signal Processing
A review of text and image retrieval approaches for broadcast news video

Information Retrieval
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Multiple feature fusion by subspace learning

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Exploiting Visual Concepts to Improve Text-Based Image Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Combining face and iris biometrics for identity verification

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Overview of the WikipediaMM task at ImageCLEF 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Methods for combining content-based and textual-based approaches in medical image retrieval

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Fusing visual and clinical information for lung tissue classification in high-resolution computed tomography

Artificial Intelligence in Medicine
Overview of the wikipediaMM task at ImageCLEF 2009

CLEF'09 Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experiments
Combining text/image in wikipediaMM task 2009

CLEF'09 Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experiments
Impact of visual information on text and content based image retrieval

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Combining image and structured text retrieval

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

With multimedia information retrieval, combining different modalities - text, image, audio or video provides additional information and generally improves the overall system performance. For this purpose, the linear combination method is presented as simple, flexible and effective. However, it requires to choose the weight assigned to each modality. This issue is still an open problem and is addressed in this paper. Our approach, based on Fisher Linear Discriminant Analysis, aims to learn these weights for multimedia documents composed of text and images. Text and images are both represented with the classical bag-of-words model. Our method was tested over the ImageCLEF datasets 2008 and 2009. Results demonstrate that our combination approach not only outperforms the use of the single textual modality but provides a nearly optimal learning of the weights with an efficient computation. Moreover, it is pointed out that the method allows to combine more than two modalities without increasing the complexity and thus the computing time.