Human-centric indoor environment modeling from depth videos
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Discriminative learning with latent variables for cluttered indoor scene understanding
Communications of the ACM
Finding happiest moments in a social context
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Hi-index | 0.02 |
We present a method whereby an embodied agent using visual perception can efficiently create a model of a local indoor environment from its experience of moving within it. Our method uses motion cues to compute likelihoods of indoor structure hypotheses, based on simple, generic geometric knowledge about points, lines, planes, and motion. We present a single-image analysis, not to attempt to identify a single accurate model, but to propose a set of plausible hypotheses about the structure of the environment from an initial frame. We then use data from subsequent frames to update a Bayesian posterior probability distribution over the set of hypotheses. The likelihood function is efficiently computable by comparing the predicted location of point features on the environment model to their actual tracked locations in the image stream. Our method runs in real-time, and it avoids the need of extensive prior training and the Manhattan-world assumption, which makes it more practical and efficient for an intelligent robot to understand its surroundings compared to most previous scene understanding methods. Experimental results on a collection of indoor videos suggest that our method is capable of an unprecedented combination of accuracy and efficiency.