Object Detection using Geometrical Context Feedback
International Journal of Computer Vision
Unsupervised discovery of mid-level discriminative patches
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
People watching: human actions as a cue for single view geometry
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Hi-index | 0.00 |
Joint reasoning about objects and 3D scene layout has shown great promise in scene interpretation. One visual cue that has been overlooked is texture arising from a spatial repetition of objects in the scene (e.g., windows of a building). Such texture provides scene-specific constraints among objects, and thus facilitates scene interpretation. We present an approach to: (1) detecting distinct textures of objects in a scene, (2) reconstructing the 3D shape of detected texture surfaces, and (3) combining object detections and shape-from-texture toward a globally consistent scene interpretation. Inference is formulated within the reinforcement learning framework as a sequential interpretation of image regions, starting from confident regions to guide the interpretation of other regions. Our algorithm finds an optimal policy that maps states of detected objects and reconstructed surfaces to actions which ought to be taken in those states, including detecting new objects and identifying new textures, so as to minimize a long-term loss. Tests against ground truth obtained from stereo images demonstrate that we can coarsely reconstruct a 3D model of the scene from a single image, without learning the layout of common scene surfaces, as done in prior work. We also show that reasoning about texture of objects improves object detection.