Text-Based Approaches for the Categorization of Images

  • Authors:
  • Carl L. Sable;Vasileios Hatzivassiloglou

  • Affiliations:
  • -;-

  • Venue:
  • ECDL '99 Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid expansion of multimedia digital collections brings to the fore the need for classifying not only text documents but their embedded non-textual parts as well. We propose a model for basing classification of multimedia on broad, non-topical features, and show how information on targeted nearby pieces of text can be used to effectively classify photographs on a first such feature, distinguishing between indoor and outdoor images. We examine several variations to a TF*IDF-based approach for this task, empirically analyze their effects, and evaluate our system on a large collection of images from current news newsgroups. In addition, we investigate alternative classification and evaluation methods, and the effect that a secondary feature can have on indoor/outdoor classification. We obtain a classification accuracy of 82%, a number that clearly outperforms baseline estimates and competing image-based approaches and nears the accuracy of humans who perform the same task with access to comparable information.