Words and pictures in the news

  • Authors:
  • Jaety Edwards;Ryan White;David Forsyth

  • Affiliations:
  • UC Berkeley;UC Berkeley;UC Berkeley

  • Venue:
  • HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We discuss the properties of a collection of news photos and captions, collected from the Associated Press and Reuters. Captions have a vocabulary dominated by proper names. We have implemented various text clustering algorithms to organize these items by topic, as well as an iconic matcher that identifies articles that share a picture. We have found that the special structure of captions allows us to extract some names of people actually portrayed in the image quite reliably, using a simple syntactic analysis. We have been able to build a directory of face images of individuals from this collection.