Object categorization

  • Authors:
  • Axel Pinz

  • Affiliations:
  • Graz University of Technology, Austria

  • Venue:
  • Foundations and Trends® in Computer Graphics and Vision
  • Year:
  • 2005

Quantified Score

Hi-index 0.02

Visualization

Abstract

This article presents foundations, original research and trends in the field of object categorization by computer vision methods. The research goals in object categorization are to detect objects in images and to determine the object's categories. Categorization aims for the recognition of generic classes of objects, and thus has also been termed 'generic object recognition'. This is in contrast to the recognition of specific, individual objects. While humans are usually better in generic than in specific recognition, categorization is much harder to achieve for today's computer architectures and algorithms. Major problems are related to the concept of a 'visual category', where a successful recognition algorithm has to manage large intra-class variabilities versus sometimes marginal inter-class differences. It turns out that several techniques which are useful for specific recognition can also be adapted to categorization, but there are also a number of recent developments in learning, representation and detection that are especially tailored to categorization. Recent results have established various categorization methods that are based on local salient structures in the images. Some of these methods use just a 'bag of keypoints' model. Others include a certain amount of geometric modeling of 2D spatial relations between parts, or 'constellations' of parts. There is now a certain maturity in these approaches and they achieve excellent recognition results on rather complex image databases. Further work focused on the description of shape and object contour for categorization is only just emerging. However, there remain a number of important open questions, which also define current and future research directions. These issues include localization abilities, required supervision, the handling of many categories, online and incremental learning, and the use of a 'visual alphabet', to name a few. These aspects are illustrated by the discussion of several current approaches, including our own patch-based system and our boundary fragment-model. The article closes with a summary and a discussion of promising future research directions.