Detecting and identifying people in mobile videos

Authors:
Xunyi Yu;Aura Ganz
Affiliations:
University of Massachusetts, Amherst, Amherst, MA, USA;University of Massachusetts, Amherst, Amherst, MA, USA
Venue:
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Year:
2011

Citing 6
Cited 0

A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
Person Reidentification Using Spatiotemporal Appearance

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
SEVA: Sensor-enhanced video annotation

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Beyond GPS: determining the camera viewing direction of a geotagged image

Proceedings of the international conference on Multimedia
Landmark image classification using 3D point clouds

Proceedings of the international conference on Multimedia
Global Identification of Tracklets in Video Using Long Range Identity Sensors

AVSS '10 Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a system capable of detecting and identifying people in videos captured by smart phones. We discuss the challenges to extend existing location aware multimedia applications from annotating static landmarks in distance to annotating dynamic people in close range with significant pose variations. We propose to use a hybrid video and RF tracking system to enable accurate observer and target localization, and extract part models comprised of Maximally Stable Color Regions for each target. The model can efficiently detect possible positions of targets in the video, which are then used as dynamic landmarks to calibrate the camera orientation. Finally, positions of all targets in the video are jointly estimated using both visual features and spatial constraints. Experiments show that our approach can locate identified targets in video with significantly higher accuracy than back-projection using camera orientation estimations from accelerometers and magnetometers.