Solving the multiple instance problem with axis-parallel rectangles
Artificial Intelligence
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Automatic text segmentation and text recognition for video indexing
Multimedia Systems
Support vector machine pairwise classifiers with error reduction for image classification
MULTIMEDIA '01 Proceedings of the 2001 ACM workshops on Multimedia: multimedia information retrieval
Saliency, Scale and Image Description
International Journal of Computer Vision
Multiple-Instance Learning for Natural Scene Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Optimal multimodal fusion for multimedia data analysis
Proceedings of the 12th annual ACM international conference on Multimedia
Proceedings of the 12th annual ACM international conference on Multimedia
A Generalized Temporal Context Model for Semantic Scene Classification
CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 6 - Volume 06
Machine Learning And Statistical Modeling Approaches To Image Retrieval (Kluwer International Series on Information Retrieval)
Generative versus Discriminative Methods for Object Recognition
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Performance Evaluation of Local Descriptors
IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Generative Models and Fisher Kernels for Object Recognition
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Creating Efficient Codebooks for Visual Recognition
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Object Categorization by Learned Universal Visual Dictionary
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Multimodal metadata fusion using causal strength
Proceedings of the 13th annual ACM international conference on Multimedia
Text Locating Competition Results
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Image Analysis for Efficient Categorization of Image-based Spam E-mail
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Detecting and reading text in natural scenes
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
An efficient method for text detection in video based on stroke width similarity
ACCV'07 Proceedings of the 8th Asian conference on Computer vision - Volume Part I
NEOCR: a configurable dataset for natural image text recognition
CBDAR'11 Proceedings of the 4th international conference on Camera-Based Document Analysis and Recognition
Object reading: text recognition for object recognition
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Con-text: text detection using background connectivity for fine-grained object classification
Proceedings of the 21st ACM international conference on Multimedia
Hi-index | 0.00 |
Conventional image categorization techniques primarily rely on low-level visual cues. In this paper, we describe a multimodal fusion scheme which improves the image classification accuracy by incorporating the information derived from the embedded texts detected in the image under classification. Specific to each image category, a text concept is first learned from a set of labeled texts in images of the target category using Multiple Instance Learning [1]. For an image under classification which contains multiple detected text lines, we calculate a weighted Euclidian distance between each text line and the learned text concept of the target category. Subsequently, the minimum distance, along with low-level visual cues, are jointly used as the features for SVM-based classification. Experiments on a challenging image database demonstrate that the proposed fusion framework achieves a higher accuracy than the state-of-art methods for image classification.