Towards decrypting attractiveness via multi-modality cues

Authors:
Tam V. Nguyen;Si Liu;Bingbing Ni;Jun Tan;Yong Rui;Shuicheng Yan
Affiliations:
National University of Singapore, Singapore;National University of Singapore, Singapore;Advanced Digital Sciences Center, Singapore;National University of Defense Technology, Hunan, China;Microsoft Research Asia, Beijing, China;National University of Singapore, Singapore
Venue:
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Year:
2013

Citing 18
Cited 1

Active shape models—their training and application

Computer Vision and Image Understanding
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Shape Matching and Object Recognition Using Shape Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Real-Time Face Detection

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Facial Attractiveness: Beauty and the Machine

Neural Computation
FaceTracer: A Search Engine for Large Collections of Images with Faces

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Automatic attribute discovery and characterization from noisy web data

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Predicting facial beauty without landmarks

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Articulated pose estimation with flexible mixtures-of-parts

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Predicting occupation via human clothing and contexts

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Relative attributes

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Describing people: A poselet-based approach to attribute classification

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Sense beauty via face, dressing, and/or voice

Proceedings of the 20th ACM international conference on Multimedia
Hi, magic closet, tell me what to wear!

Proceedings of the 20th ACM international conference on Multimedia

Static saliency vs. dynamic saliency: a comparative study

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decrypting the secret of beauty or attractiveness has been the pursuit of artists and philosophers for centuries. To date, the computational model for attractiveness estimation has been actively explored in computer vision and multimedia community, yet with the focus mainly on facial features. In this article, we conduct a comprehensive study on female attractiveness conveyed by single/multiple modalities of cues, that is, face, dressing and/or voice, and aim to discover how different modalities individually and collectively affect the human sense of beauty. To extensively investigate the problem, we collect the Multi-Modality Beauty (M2B) dataset, which is annotated with attractiveness levels converted from manual k-wise ratings and semantic attributes of different modalities. Inspired by the common consensus that middle-level attribute prediction can assist higher-level computer vision tasks, we manually labeled many attributes for each modality. Next, a tri-layer Dual-supervised Feature-Attribute-Task (DFAT) network is proposed to jointly learn the attribute model and attractiveness model of single/multiple modalities. To remedy possible loss of information caused by incomplete manual attributes, we also propose a novel Latent Dual-supervised Feature-Attribute-Task (LDFAT) network, where latent attributes are combined with manual attributes to contribute to the final attractiveness estimation. The extensive experimental evaluations on the collected M2B dataset well demonstrate the effectiveness of the proposed DFAT and LDFAT networks for female attractiveness prediction.