Multimodal fusion using learned text concepts for image categorization

Authors:
Qiang Zhu;Mei-Chen Yeh;Kwang-Ting Cheng
Affiliations:
University of California, Santa Barbara, CA;University of California, Santa Barbara, CA;University of California, Santa Barbara, CA
Venue:
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Year:
2006

Citing 22
Cited 4

Solving the multiple instance problem with axis-parallel rectangles

Artificial Intelligence
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Automatic text segmentation and text recognition for video indexing

Multimedia Systems
Support vector machine pairwise classifiers with error reduction for image classification

MULTIMEDIA '01 Proceedings of the 2001 ACM workshops on Multimedia: multimedia information retrieval
Saliency, Scale and Image Description

International Journal of Computer Vision
Multiple-Instance Learning for Natural Scene Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Optimal multimodal fusion for multimedia data analysis

Proceedings of the 12th annual ACM international conference on Multimedia
Incremental detection of text on road signs from video with application to a driving assistant system

Proceedings of the 12th annual ACM international conference on Multimedia
A Generalized Temporal Context Model for Semantic Scene Classification

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 6 - Volume 06
Machine Learning And Statistical Modeling Approaches To Image Retrieval (Kluwer International Series on Information Retrieval)

Machine Learning And Statistical Modeling Approaches To Image Retrieval (Kluwer International Series on Information Retrieval)
Generative versus Discriminative Methods for Object Recognition

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Generative Models and Fisher Kernels for Object Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Creating Efficient Codebooks for Visual Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Object Categorization by Learned Universal Visual Dictionary

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Multimodal metadata fusion using causal strength

Proceedings of the 13th annual ACM international conference on Multimedia
Text Locating Competition Results

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Image Analysis for Efficient Categorization of Image-based Spam E-mail

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Detecting and reading text in natural scenes

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

An efficient method for text detection in video based on stroke width similarity

ACCV'07 Proceedings of the 8th Asian conference on Computer vision - Volume Part I
NEOCR: a configurable dataset for natural image text recognition

CBDAR'11 Proceedings of the 4th international conference on Camera-Based Document Analysis and Recognition
Object reading: text recognition for object recognition

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Con-text: text detection using background connectivity for fine-grained object classification

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional image categorization techniques primarily rely on low-level visual cues. In this paper, we describe a multimodal fusion scheme which improves the image classification accuracy by incorporating the information derived from the embedded texts detected in the image under classification. Specific to each image category, a text concept is first learned from a set of labeled texts in images of the target category using Multiple Instance Learning [1]. For an image under classification which contains multiple detected text lines, we calculate a weighted Euclidian distance between each text line and the learned text concept of the target category. Subsequently, the minimum distance, along with low-level visual cues, are jointly used as the features for SVM-based classification. Experiments on a challenging image database demonstrate that the proposed fusion framework achieves a higher accuracy than the state-of-art methods for image classification.