Interaction between modules in learning systems for vision applications

  • Authors:
  • Thomas S. Huang;Amit Sethi

  • Affiliations:
  • University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign

  • Venue:
  • Interaction between modules in learning systems for vision applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Complex vision tasks such as event detection in a surveillance video can be divided into subtasks such as human detection, tracking, and trajectory analysis. The video can be thought of as being composed of various features. These features can be roughly arranged in a hierarchy from low level features to high-level features. Low-level features include edges and blobs, and high-level features include objects and events. Loosely, the low-level feature extraction is based on signal/image processing techniques, while the high-level feature extraction is based on machine learning techniques. Traditionally, vision systems extract features in a feedforward manner on the hierarchy; that is, certain modules extract low-level features and other modules make use of these low-level features to extract high-level features. Along with others in the research community we have worked on this design approach. We briefly present our work on object recognition and multiperson tracking systems designed with this approach and highlight its advantages and shortcomings. However, our focus is on system design methods that allow tight feedback between the layers of the feature hierarchy, as well as among the high-level modules themselves. We present previous research on systems with feedback and discuss the strengths and limitations of these approaches. This analysis allows us to develop a new framework for designing complex vision systems that allows tight feedback in a hierarchy of features and modules that extract these features using a graphical representation. This new framework is based on factor graphs. It relaxes some of the constraints of the traditional factor graphs and replaces its function nodes by modified versions of some of the modules that have been developed for specific vision tasks. These modules can be easily formulated by slightly modifying modules developed for specific tasks in other vision systems, if we can match the input and output variables to variables in our graphical structure. It also draws inspiration from product of experts and Free Energy view of the EM algorithm. We present experimental results and discuss the path for future development.