Visual words dictionaries and fusion techniques for searching people through textual and visual attributes

Authors:
Junior Fabian;Ramon Pires;Anderson Rocha
Affiliations:
-;-;-
Venue:
Pattern Recognition Letters
Year:
2014

Citing 20
Cited 0

EMPATH: face, emotion, and gender recognition using holons

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
SexNet: A neural network identifies sex from human faces

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Optimal combinations of pattern classifiers

Pattern Recognition Letters
Robust Real-Time Face Detection

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Creating Efficient Codebooks for Visual Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Automatic ranking of information retrieval systems using data fusion

Information Processing and Management: an International Journal
FaceTracer: A Search Engine for Large Collections of Images with Faces

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Robust fusion: extreme value theory for recognition score normalization

ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
Entropy-based localization of textured regions

ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing: Part I
Describable Visual Attributes for Face Verification and Image Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Sampling strategies for bag-of-features image classification

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Comparative study of global color and texture descriptors for web image retrieval

Journal of Visual Communication and Image Representation
Exploiting pairwise recommendation and clustering strategies for image re-ranking

Information Sciences: an International Journal
Multi-attribute spaces: Calibration for attribute fusion and similarity search

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Fusing with context: A Bayesian approach to combining descriptive attributes

IJCB '11 Proceedings of the 2011 International Joint Conference on Biometrics
Retinal Image Quality Analysis for Automatic Diabetic Retinopathy Detection

SIBGRAPI '12 Proceedings of the 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images
Searching for People through Textual and Visual Attributes

SIBGRAPI '12 Proceedings of the 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images

Quantified Score

Hi-index	0.10

Visualization

Abstract

Using personal traits for searching people is paramount in several application areas and has attracted an ever-growing attention from the scientific community over the past years. Some practical applications in the realm of digital forensics and surveillance include locating a suspect or finding missing people in a public space. In this paper, we aim at assigning describable visual attributes (e.g., white chubby male wearing glasses and with bangs) as labels to images to describe their appearance and performing visual searches without relying on image annotations during testing. For that, we create mid-level image representations for face images based on visual dictionaries linking visual properties in the images to describable attributes. In addition, we take advantage of machine learning techniques for combining different attributes and performing a query. First, we propose three methods for building the visual dictionaries. Method #1 uses a sparse-sampling scheme to obtain low-level features with a clustering algorithm to build the visual dictionaries. Method #2 uses dense-sampling to obtain low-level features and random selection to build the visual dictionaries while Method #3 uses dense-sampling to obtain low-level features followed by a clustering algorithm to build the visual dictionaries. Thereafter, we train 2-class classifiers for the describable visual attributes of interest which assign to each image a decision score used to obtain its ranking. For more complex queries (2+ attributes), we use three state-of-the-art approaches for combining the rankings: (1) product of probabilities, (2) rank aggregation and (3) rank position. To date, we have considered fifteen attribute classifiers and, consequently, their direct counterparts theoretically allowing 2^1^5=32,768 different combined queries (the actual number is smaller since some attributes are contradictory or mutually exclusive). Notwithstanding, the method is easily extensible to include new attributes. Experimental results show that Method #3 greatly improves retrieval precision for some attributes in comparison with other methods in the literature. Finally, for combined attributes, product of probabilities, rank aggregation and rank position yield complementary results for rank fusion and the final decision making suggesting interesting possible combinations for further work.