Learning invariance through imitation

Authors:
G. W. Taylor;I. Spiro;C. Bregler;R. Fergus
Affiliations:
Dept. of Comput. Sci., New York Univ., New York, NY, USA;Dept. of Comput. Sci., New York Univ., New York, NY, USA;Dept. of Comput. Sci., New York Univ., New York, NY, USA;Dept. of Comput. Sci., New York Univ., New York, NY, USA
Venue:
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Year:
2011

Citing 0
Cited 2

Crowdsourced data collection of facial responses

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Motion chain: a webcam game for crowdsourcing gesture collection

CHI '12 Extended Abstracts on Human Factors in Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised methods for learning an embedding aim to map high-dimensional images to a space in which perceptually similar observations have high measurable similarity. Most approaches rely on binary similarity, typically defined by class membership where labels are expensive to obtain and/or difficult to define. In this paper we propose crowd-sourcing similar images by soliciting human imitations. We exploit temporal coherence in video to generate additional pairwise graded similarities between the user-contributed imitations. We introduce two methods for learning nonlinear, invariant mappings that exploit graded similarities. We learn a model that is highly effective at matching people in similar pose. It exhibits remarkable invariance to identity, clothing, background, lighting, shift and scale.