A framework for using context to understand images of people

Authors:
Tsuhan Chen;Andrew C. Gallagher
Affiliations:
Carnegie Mellon University;Carnegie Mellon University
Venue:
A framework for using context to understand images of people
Year:
2009

Citing 0
Cited 1

Geotagging in multimedia and computer vision--a survey

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

When we see other humans, we can quickly make judgements regarding many aspects, including their demographic description and identity if they are familiar to us. We can answer questions related to the activities of, emotional states of, and relationships between people in an image. We draw conclusions based not just on what we see, but also from a lifetime of experience of living and interacting with other people. In this dissertation, we propose contextual features and models for understanding images of people with the objective of providing computers with access to the same contextual information that humans use. We show through a series of visual experiments that humans can exploit contextual knowledge to understand images of people. Recognizing other people becomes easier when the full body is shown instead of just the face. Social context is exploited to assign faces to corresponding first names, and age and gender recognition is improved when subjects see a face from an image in context with the other faces from the image instead of only a single face. In this dissertation, we provide contextual features and probabilistic frameworks to allow the computer to interpret images of people with contextual information. We propose features related to clothing, groups of associated people, relative positions of people, first name popularity, anthropometric measurements, and social relationships. The contextual features are learned from image data and from publicly available data from research organizations. By considering the context, we show improvement in a number of understanding tasks related to images of people. When applied to collections of multiple people, we show that context improves the identification of others in the collection. When considering single images, we show that context allows us to improve estimates of demographic descriptions of age and gender, as well as allowing us to determine the most likely owner of a first name such as "Taylor". Finally, we show that context allows us to perform high-level tasks such as segmenting rows of people and identifying the horizon in a single image of a group of people. This work shows that people act in predictable ways, for example that human patterns of association contain regular structure that can be effectively modeled and learned. From a broad perspective, this work shows that by exploiting information that is learned about people (in any field of science) we can improve our understanding of images of people.